Map Placement for City of Toronto Wayfinding Project

---HackOn(Data) Contest Challenge, Sep 10-11, 2016

Qian(John) Xie, Roland Sing, and Mingfei Cao

HackOnData is a free two-day event that brings together the Toronto data community to take a closer look at the data that touches our daily lives. Teams of Toronto's top data scientists and data engineers collaborated to generate practical insights from data provided by local companies, not-for profits and the government. Prior to the event, weekly workshops and challenges will help prepare participants by giving them the knowledge and hands-on experience required to ensure they can meaningfully participate. During the event, well-known mentors from Toronto and around the world engaged with participants to take their knowledge and skill to the next level. HackOn(Data) is the best platform available to local data talent and businesses to meet, collaborate, and exchange knowledge, experience and job opportunities!

Sponsers
TranQuant, flipp, wattpad, LoyaltyOne, amazon, Lightbend, GuruLink, Shopify

Partners
Toronto Apache Spark, scalator, Deep Learning Toronto, HackerNest, HacherNest Toronto Tech Socials, TechToronto, DMZ, City of Toronto, Toronto Public Library

1. Introduction

1.1 Background and Motivation

In 2011, City of Toronto launched the TO360 Wayfinding Project. The integrated multi-modal wayfinding strategy is comprised of pedestrian, vehicular, cyclying and transit wayfinding. The project is aimed to:

  • Enhance the overall image of Tornto as a destination
  • Increase visitors at key attractions, spending in the Greater Toronto Area, boost the local econnomy
  • Increase confidence to walk, reduce walk times, promote multi-modal transit and reduce auto use
  • Improve urban realm, sense of community, pedestrian safety, health and environment

The project is implemented in three phases:

  • Phase 1: Wayfinding strategies (2011 - 2012)
  • Phase 2: Pilot implementation (2014 - 2015)
  • Phase 3: City-wide roll out (2016 - 2017)

Right now, the project in in phase 3. In determining where wayfinding products are required, a number of factors were considered:

Existing Need - The implementation strategy prioritizes areas where a need for wayfinding currently exists based on:

  • having high densities of visitors who are unfamiliar with the city
  • having high pedestrian volumnes
  • having changes in mode of travel
  • being on a main street
  • being an area that is difficult to navigate
  • being close to hospitals, colleges or universities
  • being close to a city centre

Available Funding - Further, certain areas may be prioritized as project partners come forward with funding to implement the schem. Potential project partners include:

  • transit agencies
  • Business Improvment Areas
  • universities and health care campuses
  • attractions
  • city divisions
  • tourism organizations

For the HackOn(Data) event, City of Toronto have an interest in exploring a more data-driven methodology to determine the timing and geographic distribution of the required TO360 map assest upgrades. The data-driven methodology may help gain valuable insights from a different and novel perspective and help domain experts to make more effective and reliable map placement plan.

Reference: Toronto Wayfinding Strategy

1.2 Our Approach

We choose an expoloratory approach for this problem. We are not aiming to find a "perfect" solution by considering all needs and using all the available data. Instead, our goal is to build a prototype model using the a few of the most important data sources(provided by TranQuant and City of Toronto Open Data). If we can gain insights from the solution and the methodology is actionable, we can refine the prototype methodology by taking into consideration more needs, incooporating more data sources, and using more advanced algorithms.

Among the four aspects of the multi-modal wayfinding strategy, we focus on the pedestrian wayfinding. We chose to analyze two datasets from the City of Toronto, available on TranQuant:

  • Cultural Spaces: This dataset is a compilation of all spaces within the 44 City wards that were available for cultural use for a five year period.
  • Signalized Intersection Traffic and Pedestrians: This dataset contains traffic and pedestrian volume data collected at intersections where there are traffic signals from 1999 to 2015.

1.3 Overview of Solution Process

Aftering deciding to use the above two datasets, the study began with a single question: what criteria should we consider when choosing an intersection to place map?

In the begining, we considered the problem as an optimization problem. After some discussion, we decided this direction is possible but might be difficult to implement in a two-day contest. Then we did some exploratory data analysis by plotting the intersections and facilities geographic distribution on map. After seeing the plot, we thought clustering is a direction we should try. The logic is as follows:

  1. Clustering the facilites into small groups according to their geographic distance
  2. Find the geographic coordinates of each facility cluster centroid
  3. Select the closest three intersections to a facility cluster centroid
  4. Among the closest three intersections, choose the intersection with the highest pedestrian volume

The logic is heuristic but simple and easy to implement, so we managed to build a solution based on this logic within the two day time limit.

I tried two clustering algorithms: DBSCAN and KMeans. DBSCAN is density-based clustering while KMeans is centroid-based clustering

At this stage, I adopted KMeans algorithm. KMeans works better to spatially evenly divide our facilities into clusters according to their geographic coordinates. DBSCAN can provide another perspective of looking at the problem and we can explore that angle in the future.

1.4 Python Packages

  • Numpy
  • Scipy
  • Pandas
  • Matplotlib
  • scikit-learn
  • seaborn
  • folium
In [1]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import matplotlib as mpl
import matplotlib.cm as cm
import seaborn as sns
import folium
from folium import plugins
from sklearn.cluster import DBSCAN, KMeans
from sklearn import metrics
from scipy import spatial
import time
%matplotlib inline
pd.set_option('display.width', 500)
pd.set_option('display.max_columns', 100)
pd.set_option('display.notebook_repr_html', True)

2. Data Preparation

  • Pedestrian and Vehicle Volume Data of Major Intersections
    • ped_vol_2012.csv: pedestrian and vehicle volume data collected at some intersections in 2012
    • ped_vol_2013.csv: pedestrian and vehicle volume data collected at some intersections in 2013
    • ped_vol_2014.csv: pedestrian and vehicle volume data collected at some intersections in 2014
    • ped_vol_2015.csv: pedestrian and vehicle volume data collected at some intersections in 2015
    • signalizedTrafficPedestrianVolumes - pedestrian and vehicle volume data collected at some intersections from 1999 - 2012
  • Cultural Facility Data
    The same data was provided in three formats

    • MSFC_44_Wards_Complete_Final.csv: List of cultural facilities in Toronto. The file consists of the name of the facility, address, ward information, ownership of facilities that are available on a rental basis for cultural events. The two accompanying documents MSFC_Readme_1.csv and MSFC_Readme_2.csv give detailed describe the contents of the file and the schema(columns) of the file. The MSFC_44_Wards_Complete_Final.csv only provides postal code, no coordinates for facility.

    • Make_space_for_culture_mtm3.zip: with shape files of MTM 3 coordinate system and facility file MAKE_SPACE_FOR_CULTURE.dbf, which contains coordinates for facilities.

    • Make_space_for_culture_wgs84.zip: with shape files for WGS84 coordinate system and facility file MAKE_SPACE_FOR_CULTURE_WGS84.dbf, which contains coordinates for facilities.

2.1 Pedestrian and Vehicle Volume Data of Major Intersections

In [2]:
# Data collected in 2012
intersection_2012_df = pd.read_csv("./Data/ped_vol_2012.csv", header=None,
                                   names=['PX', 'main', 'midblock_route', 'side1_route', 'side2_route', 'activation_date',
                                    'latitude', 'longitude', 'count_date', '8hr_vel_vol', '8hr_ped_vol'],
                                   skiprows=1)
intersection_2012_df.head()
Out[2]:
PX main midblock_route side1_route side2_route activation_date latitude longitude count_date 8hr_vel_vol 8hr_ped_vol
0 4 JARVIS ST NaN ADELAIDE ST E NaN 12-Sep-1958 43.651534 -79.372360 8/15/2012 19037 5679
1 14 BLOOR ST E NaN TED ROGERS WAY NaN 28-Mar-1951 43.671577 -79.380520 5/31/2012 16991 7835
2 22 CHURCH ST NaN GERRARD ST E NaN 18-Jul-1941 43.659858 -79.378522 9/5/2012 17164 14490
3 25 BLOOR ST E NaN CHURCH ST NaN 19-Feb-1932 43.671022 -79.383123 5/30/2012 18894 26556
4 27 VICTORIA ST NaN RICHMOND ST E NaN 03-Nov-1927 43.651952 -79.377756 7/10/2012 12770 6969
In [3]:
# check to make sure data types are correct
intersection_2012_df.dtypes
Out[3]:
PX                   int64
main                object
midblock_route      object
side1_route         object
side2_route         object
activation_date     object
latitude           float64
longitude          float64
count_date          object
8hr_vel_vol          int64
8hr_ped_vol          int64
dtype: object
In [4]:
# check number of records
intersection_2012_df.shape
Out[4]:
(468, 11)
In [5]:
# Data collected in 2013
intersection_2013_df = pd.read_csv("./Data/ped_vol_2013.csv",
                                   names=['PX', 'main', 'midblock_route', 'side1_route', 'side2_route', 'activation_date',
                                    'latitude', 'longitude', 'count_date', '8hr_vel_vol', '8hr_ped_vol'],
                                   skiprows=1)
intersection_2013_df.head()
Out[5]:
PX main midblock_route side1_route side2_route activation_date latitude longitude count_date 8hr_vel_vol 8hr_ped_vol
0 13 MOUNT PLEASANT RD NaN CHARLES ST E JARVIS ST 23-May-1950 43.669989 -79.379923 10/8/2013 15228 2269
1 23 CHURCH ST NaN CARLTON ST NaN 17-Feb-1932 43.661871 -79.379335 6/26/2013 19901 9052
2 55 YONGE ST NaN LAWRENCE AVE NaN 03-Jan-1941 43.725103 -79.402193 10/3/2013 29417 4196
3 81 UNIVERSITY AVE NaN DUNDAS ST W NaN 28-Jun-1928 43.654812 -79.388484 8/14/2013 28831 18932
4 86 AVENUE RD NaN BLOOR ST W NaN 21-Jul-1928 43.668685 -79.394119 5/27/2013 26039 14160
In [6]:
# check to make sure data types are correct
intersection_2013_df.dtypes
Out[6]:
PX                   int64
main                object
midblock_route      object
side1_route         object
side2_route         object
activation_date     object
latitude           float64
longitude          float64
count_date          object
8hr_vel_vol          int64
8hr_ped_vol          int64
dtype: object
In [7]:
# check number of records
intersection_2013_df.shape
Out[7]:
(295, 11)
In [8]:
# Data collected in 2014
intersection_2014_df = pd.read_csv("./Data/ped_vol_2014.csv",
                                   names=['PX', 'main', 'midblock_route', 'side1_route', 'side2_route', 'activation_date',
                                    'latitude', 'longitude', 'count_date', '8hr_vel_vol', '8hr_ped_vol'],
                                   skiprows=1)
intersection_2014_df.head()
Out[8]:
PX main midblock_route side1_route side2_route activation_date latitude longitude count_date 8hr_vel_vol 8hr_ped_vol
0 29 YONGE ST NaN FRONT ST NaN 24-Aug-1928 43.646846 -79.376931 5/6/2014 10737 17522
1 30 YONGE ST NaN WELLINGTON ST NaN 06-Jan-1964 43.647848 -79.377351 5/6/2014 13620 26656
2 33 YONGE ST NaN RICHMOND ST NaN 03-Nov-1927 43.651697 -79.378965 5/6/2014 12388 24487
3 34 YONGE ST NaN QUEEN ST NaN 03-Nov-1927 43.652421 -79.379276 5/5/2014 12035 26437
4 35 YONGE ST NaN SHUTER ST NaN 12-Mar-1959 43.654066 -79.379957 5/10/2014 7830 21137
In [9]:
# check to make sure data types are correct
intersection_2014_df.dtypes
Out[9]:
PX                   int64
main                object
midblock_route      object
side1_route         object
side2_route         object
activation_date     object
latitude           float64
longitude          float64
count_date          object
8hr_vel_vol          int64
8hr_ped_vol          int64
dtype: object
In [10]:
# check number of records
intersection_2013_df.shape
Out[10]:
(295, 11)
In [11]:
# Data collected in 2014
intersection_2015_df = pd.read_csv("./Data/ped_vol_2015.csv",
                                   names=['PX', 'main', 'midblock_route', 'side1_route', 'side2_route', 'activation_date',
                                    'latitude', 'longitude', 'count_date', '8hr_vel_vol', '8hr_ped_vol'],
                                   skiprows=1)
intersection_2015_df.head()
Out[11]:
PX main midblock_route side1_route side2_route activation_date latitude longitude count_date 8hr_vel_vol 8hr_ped_vol
0 8 JARVIS ST NaN DUNDAS ST E NaN 21-Jun-1928 43.657052 -79.374531 4/13/2015 18207 7458
1 15 CHURCH ST NaN FRONT ST E WELLINGTON ST E 24-Apr-1950 43.648513 -79.373834 4/20/2015 10467 7381
2 21 DUNDAS ST E NaN CHURCH ST NaN 17-Dec-1952 43.656499 -79.377127 4/13/2015 13764 10302
3 31 YONGE ST NaN KING ST NaN 03-Nov-1927 43.649162 -79.377904 1/22/2015 13527 33938
4 36 YONGE ST NaN DUNDAS ST NaN 04-Apr-1927 43.656326 -79.380912 4/11/2015 10925 34615
In [12]:
# check to make sure data types are correct
intersection_2015_df.dtypes
Out[12]:
PX                   int64
main                object
midblock_route      object
side1_route         object
side2_route         object
activation_date     object
latitude           float64
longitude          float64
count_date          object
8hr_vel_vol          int64
8hr_ped_vol          int64
dtype: object
In [13]:
# check number of records
intersection_2015_df.shape
Out[13]:
(293, 11)
In [14]:
# Data collected in 1999-2012, in the signalizedTrafficPedestrianVolumes.csv file 
# note that in the signalizedTrafficPedestrianVolumes.csv file 
# the 8hr_vel_vol, 8hr_ped_vol, 24hr_vel_vol, and 24hr_ped_vol columns
# contains comma in the number for the thousands marker. So you need to use the 
# keyword parameter thousands ="," to correly read the table.
# otherwise those columns will be returned as object(string) columns
intersection_1999_2012_df1 = pd.read_csv("./Data/signalizedTrafficPedestrianVolumes.csv",
                            names=['PX', 'main', 'midblock_route', 'side1_route', 'side2_route', 
                                   'activation_date','latitude', 'longitude', 'count_date',
                                   '8hr_vel_vol', '8hr_ped_vol', '24hr_ped_vol', '24hr_veh_vol'],
                             skiprows=1, thousands=",")
intersection_1999_2012_df1.head()
Out[14]:
PX main midblock_route side1_route side2_route activation_date latitude longitude count_date 8hr_vel_vol 8hr_ped_vol 24hr_ped_vol 24hr_veh_vol
0 2 JARVIS ST NaN FRONT ST E NaN 11/15/1948 43.649450 -79.371410 9/8/2011 17008 19335 34016 38670
1 3 KING ST E NaN JARVIS ST NaN 8/23/1950 43.650461 -79.371924 9/7/2011 37719 17665 75438 35330
2 4 JARVIS ST NaN ADELAIDE ST E NaN 9/12/1958 43.651534 -79.372360 6/16/2008 1991 19726 3982 39452
3 5 JARVIS ST NaN RICHMOND ST E NaN 4/21/1962 43.652718 -79.372824 7/30/2009 2696 24842 5392 49684
4 6 JARVIS ST NaN QUEEN ST E NaN 8/24/1928 43.653704 -79.373238 5/18/2011 3622 19772 7244 39544
In [15]:
# check to make sure data types are correct
intersection_1999_2012_df1.dtypes
Out[15]:
PX                   int64
main                object
midblock_route      object
side1_route         object
side2_route         object
activation_date     object
latitude           float64
longitude          float64
count_date          object
8hr_vel_vol          int64
8hr_ped_vol          int64
24hr_ped_vol         int64
24hr_veh_vol         int64
dtype: object
In [16]:
# check number of records
intersection_1999_2012_df1.shape
Out[16]:
(2209, 13)

Note
About intersection_1999_2012_df dataframe, if you look carefully, the 24hr_ped_vol and 24hr_veh_vol are just 2x 8hr_ped_vol and 2x 8hr_veh_vol respectively. They are redundant columns and can be dropped.

In [17]:
intersection_1999_2012_df=intersection_1999_2012_df1.drop(['24hr_ped_vol', '24hr_veh_vol'], axis = 1)
intersection_1999_2012_df.head()
Out[17]:
PX main midblock_route side1_route side2_route activation_date latitude longitude count_date 8hr_vel_vol 8hr_ped_vol
0 2 JARVIS ST NaN FRONT ST E NaN 11/15/1948 43.649450 -79.371410 9/8/2011 17008 19335
1 3 KING ST E NaN JARVIS ST NaN 8/23/1950 43.650461 -79.371924 9/7/2011 37719 17665
2 4 JARVIS ST NaN ADELAIDE ST E NaN 9/12/1958 43.651534 -79.372360 6/16/2008 1991 19726
3 5 JARVIS ST NaN RICHMOND ST E NaN 4/21/1962 43.652718 -79.372824 7/30/2009 2696 24842
4 6 JARVIS ST NaN QUEEN ST E NaN 8/24/1928 43.653704 -79.373238 5/18/2011 3622 19772
In [18]:
intersection_1999_2012_df1.dtypes
Out[18]:
PX                   int64
main                object
midblock_route      object
side1_route         object
side2_route         object
activation_date     object
latitude           float64
longitude          float64
count_date          object
8hr_vel_vol          int64
8hr_ped_vol          int64
24hr_ped_vol         int64
24hr_veh_vol         int64
dtype: object
In [19]:
# check number of records
intersection_1999_2012_df.shape
Out[19]:
(2209, 11)

Combine All Intersection Data into Single DataFrame

In [20]:
# check the total number of intersection records
total_intersection_records = (intersection_1999_2012_df.shape[0]+ intersection_2012_df.shape[0]
                              + intersection_2013_df.shape[0]+ intersection_2014_df.shape[0]
                              + intersection_2015_df.shape[0])
print total_intersection_records
3563
In [21]:
# combine all intersection data 
frames =[intersection_1999_2012_df, intersection_2012_df, 
         intersection_2013_df, intersection_2014_df,
         intersection_2015_df]
intersection_all_df = pd.concat(frames)
intersection_all_df.head()
Out[21]:
PX main midblock_route side1_route side2_route activation_date latitude longitude count_date 8hr_vel_vol 8hr_ped_vol
0 2 JARVIS ST NaN FRONT ST E NaN 11/15/1948 43.649450 -79.371410 9/8/2011 17008 19335
1 3 KING ST E NaN JARVIS ST NaN 8/23/1950 43.650461 -79.371924 9/7/2011 37719 17665
2 4 JARVIS ST NaN ADELAIDE ST E NaN 9/12/1958 43.651534 -79.372360 6/16/2008 1991 19726
3 5 JARVIS ST NaN RICHMOND ST E NaN 4/21/1962 43.652718 -79.372824 7/30/2009 2696 24842
4 6 JARVIS ST NaN QUEEN ST E NaN 8/24/1928 43.653704 -79.373238 5/18/2011 3622 19772
In [22]:
print total_intersection_records == intersection_all_df.shape[0]
True
In [23]:
# PX is a unique ID for an intersection, if we sort the combined dataframe by 'PX'
# We can see that some of the intersections have multiple records of pedestrian
# and vehicle volume data.
intersection_all_df.sort_values('PX')
Out[23]:
PX main midblock_route side1_route side2_route activation_date latitude longitude count_date 8hr_vel_vol 8hr_ped_vol
0 2 JARVIS ST NaN FRONT ST E NaN 11/15/1948 43.649450 -79.371410 9/8/2011 17008 19335
1 3 KING ST E NaN JARVIS ST NaN 8/23/1950 43.650461 -79.371924 9/7/2011 37719 17665
2 4 JARVIS ST NaN ADELAIDE ST E NaN 9/12/1958 43.651534 -79.372360 6/16/2008 1991 19726
0 4 JARVIS ST NaN ADELAIDE ST E NaN 12-Sep-1958 43.651534 -79.372360 8/15/2012 19037 5679
3 5 JARVIS ST NaN RICHMOND ST E NaN 4/21/1962 43.652718 -79.372824 7/30/2009 2696 24842
4 6 JARVIS ST NaN QUEEN ST E NaN 8/24/1928 43.653704 -79.373238 5/18/2011 3622 19772
5 7 JARVIS ST NaN SHUTER ST NaN 11/18/1948 43.655357 -79.373862 5/24/2011 1979 17492
6 8 JARVIS ST NaN DUNDAS ST E NaN 6/21/1928 43.657052 -79.374531 5/18/2011 5651 20116
0 8 JARVIS ST NaN DUNDAS ST E NaN 21-Jun-1928 43.657052 -79.374531 4/13/2015 18207 7458
7 9 JARVIS ST NaN GERRARD ST E NaN 7/14/1941 43.660432 -79.375854 5/18/2011 4284 19891
8 10 JARVIS ST NaN CARLTON ST NaN 6/28/1928 43.662420 -79.376708 5/24/2011 4189 21040
9 11 JARVIS ST NaN WELLESLEY ST E NaN 9/28/1948 43.666289 -79.378325 5/24/2011 5001 21072
10 12 JARVIS ST NaN ISABELLA ST NaN 9/13/1960 43.668869 -79.379416 5/18/2011 1679 15958
0 13 MOUNT PLEASANT RD NaN CHARLES ST E JARVIS ST 23-May-1950 43.669989 -79.379923 10/8/2013 15228 2269
11 13 MOUNT PLEASANT RD NaN CHARLES ST E JARVIS ST 5/23/1950 43.669989 -79.379923 5/24/2011 2094 20343
12 14 BLOOR ST E NaN TED ROGERS WAY NaN 3/28/1951 43.671577 -79.380520 12/8/2009 4426 15399
1 14 BLOOR ST E NaN TED ROGERS WAY NaN 28-Mar-1951 43.671577 -79.380520 5/31/2012 16991 7835
13 15 CHURCH ST NaN FRONT ST E WELLINGTON ST E 4/24/1950 43.648513 -79.373834 4/23/2009 8053 16666
1 15 CHURCH ST NaN FRONT ST E WELLINGTON ST E 24-Apr-1950 43.648513 -79.373834 4/20/2015 10467 7381
14 16 KING ST E NaN CHURCH ST NaN 1/10/1928 43.649916 -79.374409 4/23/2009 11277 13419
15 17 CHURCH ST NaN ADELAIDE ST E NaN 11/22/1948 43.651173 -79.374925 7/27/2011 5765 12868
16 18 CHURCH ST NaN RICHMOND ST E NaN 11/22/1948 43.652441 -79.375448 11/19/2007 4569 18354
17 19 QUEEN ST E NaN CHURCH ST NaN 8/24/1983 43.653170 -79.375754 8/16/2011 13157 12609
18 20 CHURCH ST NaN SHUTER ST NaN 11/18/1948 43.654817 -79.376441 11/16/2009 7480 6523
19 21 DUNDAS ST E NaN CHURCH ST NaN 12/17/1952 43.656499 -79.377127 11/17/2009 8268 11280
2 21 DUNDAS ST E NaN CHURCH ST NaN 17-Dec-1952 43.656499 -79.377127 4/13/2015 13764 10302
20 22 CHURCH ST NaN GERRARD ST E NaN 7/18/1941 43.659858 -79.378522 3/4/2008 5880 18093
2 22 CHURCH ST NaN GERRARD ST E NaN 18-Jul-1941 43.659858 -79.378522 9/5/2012 17164 14490
21 23 CHURCH ST NaN CARLTON ST NaN 2/17/1932 43.661871 -79.379335 8/10/2011 13739 16193
1 23 CHURCH ST NaN CARLTON ST NaN 17-Feb-1932 43.661871 -79.379335 6/26/2013 19901 9052
... ... ... ... ... ... ... ... ... ... ... ...
292 2332 STEELES AVE W NaN GIHON SPRING DR PRIVATE ACCESS 25-Feb-2013 43.759258 -79.597022 5/15/2015 18164 101
293 2333 FINCH AVE W NaN MILADY RD NaN 05-Feb-2013 43.747310 -79.564383 12/9/2013 15773 368
291 2335 THE EAST MALL NaN VALHALLA INN RD NaN 21-May-2014 43.641464 -79.557686 12/16/2014 10997 258
292 2337 THE EAST MALL NaN YARN RD NaN 05-Jun-2013 43.632309 -79.553688 9/29/2014 8575 615
293 2338 BONIS AVE NaN KING HENRYS BLVD CARABOB CRT 11-Oct-2013 43.784094 -79.297152 12/18/2014 4677 1734
294 2347 LANSDOWNE AVE NaN SEAFORTH AVE NaN 24-Jul-2013 43.642696 -79.437452 12/18/2014 6127 1245
462 2347 LANSDOWNE AVE NaN SEAFORTH AVE NaN 24-Jul-2013 43.642696 -79.437452 3/8/2012 5629 1454
295 2349 WELLESLEY ST E NaN HOMEWOOD AVE NaN 08-Aug-2013 43.666490 -79.375829 12/16/2014 6199 2798
294 2364 LAWRENCE AVE W NaN CORONA ST PRIVATE ACCESS 19-Nov-2013 43.714339 -79.452380 6/13/2013 12634 180
463 2366 DUNDAS ST W NaN STERLING RD PRIVATE ACCESS 05-Nov-2013 43.650654 -79.443039 1/24/2012 12167 782
296 2366 DUNDAS ST W NaN STERLING RD PRIVATE ACCESS 05-Nov-2013 43.650654 -79.443039 9/9/2014 13910 955
464 2374 BAY ST NaN CUMBERLAND ST NaN 20-Mar-2014 43.670549 -79.389803 9/5/2012 9558 8210
297 3002 ST DENNIS DR NaN DEAUVILLE LANE NaN 05-Mar-1976 43.718306 -79.331607 7/22/2014 5921 982
2195 3002 ST DENNIS DR NaN DEAUVILLE LANE NaN 3/5/1976 43.718306 -79.331607 11/2/2009 796 5898
2196 3005 WINDERMERE AVE NaN MORNINGSIDE AVE NaN 1/1/1967 43.644885 -79.478592 12/10/2009 634 6488
2197 3006 GATEWAY BLVD NaN GRENOBLE DR NaN 5/28/1976 43.711967 -79.330476 1/25/2011 1185 5500
2198 3009 ALLIANCE AVE NaN ROCKCLIFFE BLVD NaN 2/13/1969 43.676947 -79.489360 3/10/2009 622 7916
2199 3011 OLD FINCH AVE 250m EAST OF SEWELLS RD NaN 4/11/1974 43.825259 -79.196812 11/27/2008 0 1081
2200 3014 ZOO RD NaN MEADOWVALE RD N ZOO W RAMP ZOO MEADOWVALE RD N RAMP 2/8/1974 43.818187 -79.173580 7/26/2009 16 1982
2201 4001 QUEENS QUAY W NaN LOWER SPADINA AVE NaN 8/22/1990 43.637670 -79.391910 7/17/2007 1804 7466
465 4001 QUEENS QUAY W NaN LOWER SPADINA AVE NaN 22-Aug-1990 43.637670 -79.391910 6/7/2012 8423 2887
2202 4002 QUEENS QUAY W 95m EAST OF LOWER SPADINA AVE PRIVATE ACCESS 8/23/1990 43.637833 -79.390967 4/22/2008 1604 7022
2203 4003 QUEENS QUAY W NaN REES ST ROBERTSON CRES 6/18/1990 43.638798 -79.386772 9/27/2007 1456 5754
466 4003 QUEENS QUAY W NaN REES ST ROBERTSON CRES 18-Jun-1990 43.638798 -79.386772 2/15/2012 6665 2434
2204 4004 QUEENS QUAY W NaN LOWER SIMCOE ST NaN 9/25/1990 43.639449 -79.382924 4/23/2009 2104 7181
2205 4005 QUEENS QUAY W NaN YORK ST HARBOUR SQ 8/8/1961 43.639908 -79.380399 4/22/2009 4629 9230
2206 4006 QUEENS QUAY W 100m EAST OF YORK ST PRIVATE ACCESS 2/8/1990 43.640323 -79.379279 1/21/2008 1516 7575
467 4006 QUEENS QUAY W 100m EAST OF YORK ST PRIVATE ACCESS 08-Feb-1990 43.640323 -79.379279 8/7/2012 8445 5499
2207 4007 QUEENS QUAY W NaN BAY ST HARBOUR SQ 5/11/1963 43.641096 -79.377041 4/22/2009 4381 9423
2208 4008 QUEENS QUAY W NaN DAN LECKIE WAY NaN 7/14/2000 43.636616 -79.396929 11/19/2007 563 4713

3563 rows × 11 columns

Average pedestrian and vehicle volume for intersections with multiple record

We only want unique intersections in the final dataframe, so we need to average the multiple records for a single intersection.

  1. Create two dataframes from intersection_ped_df data frame.
  2. The first dataframe contain colums PX, main, midblock_route, side1_route, side2_route, latitude, and longitude. Remove duplicates based on PX, so we have unique intersections.
  3. Second dataframe contain columns PX, average 8hr_vel_vol, and average 8hr_ped_vol
  4. Join the two dataframes by PX
In [24]:
# first dataframe 
unique_intersection_df = intersection_all_df[['PX', 'main', 'midblock_route', 'side1_route', 
                              'side2_route', 'latitude', 'longitude']].drop_duplicates('PX')

unique_intersection_df.count() #2256 unique intersections
Out[24]:
PX                2256
main              2256
midblock_route     171
side1_route       2256
side2_route        766
latitude          2256
longitude         2256
dtype: int64
In [25]:
# seond dataframe
avg_ped_vel_df = intersection_all_df[['PX', '8hr_vel_vol', '8hr_ped_vol']].groupby('PX').mean()
avg_ped_vel_df.count()
Out[25]:
8hr_vel_vol    2256
8hr_ped_vol    2256
dtype: int64
In [26]:
avg_ped_vel_df.head()
Out[26]:
8hr_vel_vol 8hr_ped_vol
PX
2 17008.0 19335.0
3 37719.0 17665.0
4 10514.0 12702.5
5 2696.0 24842.0
6 3622.0 19772.0
In [27]:
# Join the two dataframes
# This dataframe will be the final dataframe for analysis use
intersection_ped_df = unique_intersection_df.join(avg_ped_vel_df, on='PX', how = 'inner')
intersection_ped_df.head()
Out[27]:
PX main midblock_route side1_route side2_route latitude longitude 8hr_vel_vol 8hr_ped_vol
0 2 JARVIS ST NaN FRONT ST E NaN 43.649450 -79.371410 17008.0 19335.0
1 3 KING ST E NaN JARVIS ST NaN 43.650461 -79.371924 37719.0 17665.0
2 4 JARVIS ST NaN ADELAIDE ST E NaN 43.651534 -79.372360 10514.0 12702.5
3 5 JARVIS ST NaN RICHMOND ST E NaN 43.652718 -79.372824 2696.0 24842.0
4 6 JARVIS ST NaN QUEEN ST E NaN 43.653704 -79.373238 3622.0 19772.0
In [28]:
intersection_ped_df.dtypes
Out[28]:
PX                  int64
main               object
midblock_route     object
side1_route        object
side2_route        object
latitude          float64
longitude         float64
8hr_vel_vol       float64
8hr_ped_vol       float64
dtype: object
In [29]:
intersection_ped_df.shape
Out[29]:
(2256, 9)

We have 2256 unique major intersections that have pedestrian and vehicle volume counted. We will select intersections from these ones to place map.

2.2 Cultural Facility Data

Column Names of the Dataset


ADD_NUM = ADDRESS_NUMBER (Street number)
LF_NAME = LINEAR_NAME_FULL (Street Name)
ADDRESS = ADDRESS_FULL (Full address)
POSTAL_CD = POSTAL_CODE (POSTAL CODE)
CITY = CITY
X = X (Easting in MTM NAD27 3 degree Projection)
Y = Y (Northing in MTM NAD27 3 degree Projection)
LONGITUDE = LONGITUDE (LONGITUDE = Longitude in WGS84 Coordinate System)
LATITUDE = LATITUDE (Latitude in WGS84 Coordinate System)
FAC_NAM = FACILITY_NAME (FACILITY NAME)
STE_FLR_UN = SUITE_FLOOR_UNIT (SUITE FLOOR UNIT)
PERFRMANCE = PERFORMANCE (Spaces in which performing arts (dance, music, theatre, etc.) creation or presentation takes place)
EXHBVISARTT = EXHIBITION_VISUAL_ARTS (Spaces in which visual arts creation or presentation can take place, in addition to pure exhibition space.)
SCRN_BASED = SCREEN_BASED (Spaces for the production and presentation of multimedia screen-based arts including digital, )
LIBRARY = LIBRARY (Toronto Public Library facility with physical space for cultural activity)
MULTIPURP = MULTIPURPOSE (Spaces that are not purpose-built and can house a range of cultural activity across disciplines.)
HERITAGE = HERITAGE (Facilities where heritage activity takes place (historical societies, archives, community museums etc.)
OWNERSHIP = OWNERSHIP (OWNERSHIP)
OBJECTID = OBJECTID (Unique system identifier)

In [30]:
# use lower case column names for facility dataframe
col_names = ['add_num', 'lf_name', 'address', 'postal_cd', 'city', 'x', 'y', 'longitude', 'latitude', 'fac_name',
        'ste_flr_un', 'performance', 'exhbvisart', 'scrn_based', 'library', 'multipurp', 'heritage', 'ownership', 'objectID']
In [31]:
facility_df = pd.read_csv('./Data/facilities/make_space_for_culture_wgs84/MAKE_SPACE_FOR_CULTURE_WGS84.csv',
                         names = col_names, skiprows =1)
facility_df.head()
Out[31]:
add_num lf_name address postal_cd city x y longitude latitude fac_name ste_flr_un performance exhbvisart scrn_based library multipurp heritage ownership objectID
0 3465 Dundas St W 3465 Dundas St W M6S 2S5 Toronto 305795.585 4835942.551 -79.487461 43.665434 Club Hispano NaN 1 1 0 0 0 0 Owned by others 191
1 181 Glenlake Ave 181 Glenlake Ave M6P 4B6 Toronto 307834.850 4835127.684 -79.462181 43.658093 Keele Community Centre NaN 0 0 0 0 1 0 City Owned 194
2 288 Humberside Ave 288 Humberside Ave M6P 1L5 Toronto 307268.847 4835411.316 -79.469196 43.660649 St. Johns West Toronto Anglican Church NaN 1 0 0 0 1 0 Owned by others 195
3 238 Jane St 238 Jane St M6S 3Z1 Toronto 305831.038 4834780.388 -79.487021 43.654964 Art Works Art School NaN 0 1 0 0 0 0 Owned by others 196
4 95 Lavinia Ave 95 Lavinia Ave M6S 3H9 Toronto 306568.733 4833967.983 -79.477882 43.647658 Swansea Memorial Library NaN 0 0 0 1 0 0 City Owned 199
In [32]:
facility_df.dtypes
Out[32]:
add_num         object
lf_name         object
address         object
postal_cd       object
city            object
x              float64
y              float64
longitude      float64
latitude       float64
fac_name        object
ste_flr_un      object
performance      int64
exhbvisart       int64
scrn_based       int64
library          int64
multipurp        int64
heritage         int64
ownership       object
objectID         int64
dtype: object
In [33]:
# 1397 cultural facilities
facility_df.shape
Out[33]:
(1397, 19)

We have 1379 cultural facilities in City of Toronto

3 Exploratory Data Analysis

3.1 Intersections and Cultural Facilities Distribution Plot

In [34]:
# plot intersection using their geographic coordinates
fig, ax = plt.subplots(figsize=[20, 12])
intersection_scatter = ax.scatter(intersection_ped_df['longitude'], intersection_ped_df['latitude'], c='b', edgecolor='None', alpha=0.9, s=12)
ax.set_title('City of Toronto Major Intersections', fontsize = 30)
ax.set_xlabel('Longitude', fontsize=20)
ax.set_ylabel('Latitude', fontsize =20)
ax.legend([intersection_scatter], ['Intersections'], loc='upper right', fontsize = 20)
plt.show()

From the graph, we can see the shape of City of Toronto and some of the main streets.

In [35]:
### Plot intersections on interactive Map
intersection_map = folium.Map(location = [43.6532, -79.3832])
intersection_map.save('intersections.html')
marker_cluster_intersection = folium.MarkerCluster().add_to(intersection_map)
for index, row in intersection_ped_df.iterrows():
    folium.Marker([row["latitude"],row["longitude"]] ).add_to(marker_cluster_intersection )
intersection_map.save('intersections.html')
intersection_map
Out[35]:

The interactive map gives up more context about each intersection

In [36]:
# plot Cultural facilities distribution based on
# geographic coordinates
fig, ax = plt.subplots(figsize=[20, 12])
facility_scatter = ax.scatter(facility_df['longitude'], facility_df['latitude'], 
                              c='#99cc99', edgecolor='None', alpha=0.9, s=120)
ax.set_title('City of Toronto Cultural Facilities', fontsize = 30)
ax.set_xlabel('Longitude', fontsize=20)
ax.set_ylabel('Latitude', fontsize =20)
ax.legend([facility_scatter], ['Cultural Facilities'], loc='upper right', fontsize = 20)
plt.show()

It is no surprise that cultural facilities are condensed in downtown area of City of Toronto.

Plot Cultural Facilities on Interactive Map

In [37]:
facility_map = folium.Map(location = [43.6532, -79.3832])
facility_map.save('facilities.html')
marker_cluster_facility = folium.MarkerCluster().add_to(facility_map)
for index, row in facility_df.iterrows():
    folium.Marker([row["latitude"],row["longitude"]] ).add_to(marker_cluster_facility)
facility_map.save('facilities.html')
facility_map
Out[37]:

3.2 Clustering Facilities Using DBSCAN Algorithm

We first try DBSCAN algorithm in scikit-learn for clustering. DBSCAN is base on the paper:

  • “A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise” Ester, M., H. P. Kriegel, J. Sander, and X. Xu, In Proceedings of the 2nd International Conference on Knowledge Discovery and Data Mining, Portland, OR, AAAI Press, pp. 226–231. 1996

This algorithm is aimed for spatial data, which might be a good fit for our case.

The following is a description of DBSCAN algorithm from sklearn:

The DBSCAN algorithm views clusters as areas of high density separated by areas of low density. Due to this rather generic view, clusters found by DBSCAN can be any shape, as opposed to k-means which assumes that clusters are convex shaped. The central component to the DBSCAN is the concept of core samples, which are samples that are in areas of high density. A cluster is therefore a set of core samples, each close to each other (measured by some distance measure) and a set of non-core samples that are close to a core sample (but are not themselves core samples). There are two parameters to the algorithm, min_samples and eps, which define formally what we mean when we say dense. Higher min_samples or lower eps indicate higher density necessary to form a cluster.

DBSCAN algorithm clusters dataset based on two parameters:

  • eps - The max distance between neighbour points to be considerted in a cluster

  • min_samples - the minimum cluster size. If it is set to 1, it means every data point will be assigned to either a cluster or form its own cluster of 1 data point. If min_sample is set to be larger than one, then cluster with size less than min_sample will be considered as noise.

The scikit-learn DBSCAN haversine distance metric requires data in the form of [latitude, longitude] and both inputs and outputs are in units of radians.

Reference
Clustering to Reduce Spatial Data Set Size

Choose Parameters for DBSCAN clustering algorithm

Since in our facility dataset, we need to consider all facilities, we don't want any of them to be classified as noise, so we set min_samples=1. Now our clustering will depend on a proper eps value.

Let's try esp = 1.5, 1.0, 0.7 (km)

Note that eps need to be converted to radians for use by harversine

In [38]:
# define the number of kilometers in one radiation
# which will be used to convert esp from km to radiation
kms_per_rad = 6371.0088
In [39]:
# define a function to calculate the geographic coordinate 
# centroid of a cluster of geographic points
# it will be used later to calculate the centroids of DBSCAN cluster
# because Scikit-learn DBSCAN cluster algorithm does not calculate centroid
def get_centroid(cluster):
  """calculate the centroid of a cluster of geographic coordinate points
  Args:
    cluster: cluster coordinates, nx2 array-like (array, list of lists, etc) 
    n is the number of coordinate points(latitude, longitude)in the cluster.
  Return:
    centroid: numpy array, geometry centroid of the cluster
    
  """
  cluster_ary = np.asarray(cluster)
  centroid = cluster_ary.mean(axis = 0)
  return centroid

# testing get_centroid function
test_cluster= [[ 43.70487299, -79.57753802], 
               [ 43.71138367, -79.56524418],
               [ 43.72616079, -79.57319998],
               [ 43.73547907, -79.56258364],
               [ 43.72070325, -79.57202018],
               [ 43.73126031, -79.5598719 ]]
test_centroid = get_centroid(test_cluster)
print test_centroid
print type(test_centroid)
[ 43.72164335 -79.56840965]
<type 'numpy.ndarray'>

3.2.1 DBSCAN Clustering with eps = 1.5

In [40]:
# convert eps to radians for use by haversine
epsilon = 1.5/kms_per_rad

# Extract intersection coordinates (latitude, longitude)
fac_coords = facility_df.as_matrix(columns = ['latitude', 'longitude'])

start_time = time.time()
dbsc = (DBSCAN(eps=epsilon, min_samples=1, algorithm='ball_tree', metric='haversine')
        .fit(np.radians(fac_coords)))
fac_cluster_labels = dbsc.labels_

# get the number of clusters
num_clusters = len(set(dbsc.labels_))

# print the outcome
message = 'Clustered {:,} points down to {:,} clusters, for {:.1f}% compression in {:,.2f} seconds'
print(message.format(len(facility_df), num_clusters, 100*(1 - float(num_clusters) / len(facility_df)), time.time()-start_time))
print('Silhouette coefficient: {:0.03f}'.format(metrics.silhouette_score(fac_coords, fac_cluster_labels)))

# turn the clusters into a pandas series,where each element is a cluster of points
dbsc_clusters = pd.Series([fac_coords[fac_cluster_labels==n] for n in range(num_clusters)])
Clustered 1,397 points down to 20 clusters, for 98.6% compression in 0.27 seconds
Silhouette coefficient: -0.165
In [41]:
# get centroid of each cluster
fac_centroids = dbsc_clusters.map(get_centroid)
# unzip the list of centroid points (lat, lon) tuples into separate lat and lon lists
cent_lats, cent_lons = zip(*fac_centroids)
# from these lats/lons create a new df of one representative point for eac cluster
centroids_df = pd.DataFrame({'longitude':cent_lons, 'latitude':cent_lats})
In [42]:
# Plot the facility clusters and cluster centroid
fig, ax = plt.subplots(figsize=[20, 12])
facility_scatter = ax.scatter(facility_df['longitude'], facility_df['latitude'], 
                              c=fac_cluster_labels, cmap = cm.Dark2, 
                              edgecolor='None', alpha=0.7, s=120)
centroid_scatter = ax.scatter(centroids_df['longitude'], centroids_df['latitude'], 
                              marker='x', linewidths=2, c='k', s=50)
ax.set_title('DBSCAN eps = 1.5, Facility Clusters & Facility Cluster Centroids', fontsize = 30)
ax.set_xlabel('Longitude', fontsize=24)
ax.set_ylabel('Latitude', fontsize = 24)
ax.legend([facility_scatter, centroid_scatter],
          ['Facilities', 'Facility Cluster Centroid'], 
          loc='upper right', fontsize = 20)
plt.show()

As we can see when we set eps = 1.5, the DBSCAN algorithm divided the facilities into 20 clusters.

3.2.2 DBSCAN Clustering with eps = 0.7

In [43]:
# convert eps to radians for use by haversine
epsilon = 0.7/kms_per_rad

# Extract intersection coordinates (latitude, longitude)
fac_coords = facility_df.as_matrix(columns = ['latitude', 'longitude'])

start_time = time.time()
dbsc = (DBSCAN(eps=epsilon, min_samples=1, algorithm='ball_tree', metric='haversine')
        .fit(np.radians(fac_coords)))
fac_cluster_labels = dbsc.labels_

# get the number of clusters
num_clusters = len(set(dbsc.labels_))

# print the outcome
message = 'Clustered {:,} points down to {:,} clusters, for {:.1f}% compression in {:,.2f} seconds'
print(message.format(len(facility_df), num_clusters, 100*(1 - float(num_clusters) / len(facility_df)), time.time()-start_time))
print('Silhouette coefficient: {:0.03f}'.format(metrics.silhouette_score(fac_coords, fac_cluster_labels)))

# turn the clusters into a pandas series,where each element is a cluster of points
dbsc_clusters = pd.Series([fac_coords[fac_cluster_labels==n] for n in range(num_clusters)])
Clustered 1,397 points down to 185 clusters, for 86.8% compression in 0.05 seconds
Silhouette coefficient: 0.004
In [44]:
# get centroid of each cluster
fac_centroids = dbsc_clusters.map(get_centroid)
# unzip the list of centroid points (lat, lon) tuples into separate lat and lon lists
cent_lats, cent_lons = zip(*fac_centroids)
# from these lats/lons create a new df of one representative point for eac cluster
centroids_df = pd.DataFrame({'longitude':cent_lons, 'latitude':cent_lats})
In [45]:
# Plot the facility clusters and cluster centroid
fig, ax = plt.subplots(figsize=[20, 12])
facility_scatter = ax.scatter(facility_df['longitude'], facility_df['latitude'],
                              c=fac_cluster_labels, cmap = cm.Dark2, edgecolor='None', alpha=0.7, s=120)
centroid_scatter = ax.scatter(centroids_df['longitude'], centroids_df['latitude'],
                              marker='x', linewidths=2, c='k', s=50)
ax.set_title('DBSCAN eps = 0.7, Facility Clusters & Facility Cluster Centroids', fontsize = 30)
ax.set_xlabel('Longitude', fontsize=24)
ax.set_ylabel('Latitude', fontsize = 24)
ax.legend([facility_scatter, centroid_scatter], ['Facilities', 'Facility Cluster Centroid'],
          loc='upper right', fontsize = 20)
plt.show()

When we set eps = 0.7, the DBSCAN algorithm divided the facilities into 185 clusters.

3.2.3 DBSCAN Clustering with eps = 1.0 km

In [46]:
# convert eps to radians for use by haversine
epsilon = 1.0/kms_per_rad

# Extract intersection coordinates (latitude, longitude)
fac_coords = facility_df.as_matrix(columns = ['latitude', 'longitude'])

start_time = time.time()
dbsc = (DBSCAN(eps=epsilon, min_samples=1, algorithm='ball_tree', metric='haversine')
        .fit(np.radians(fac_coords)))
fac_cluster_labels = dbsc.labels_

# get the number of clusters
num_clusters = len(set(dbsc.labels_))

# print the outcome
message = 'Clustered {:,} points down to {:,} clusters, for {:.1f}% compression in {:,.2f} seconds'
print(message.format(len(facility_df), num_clusters, 100*(1 - float(num_clusters) / len(facility_df)), time.time()-start_time))
print('Silhouette coefficient: {:0.03f}'.format(metrics.silhouette_score(fac_coords, fac_cluster_labels)))

# turn the clusters into a pandas series,where each element is a cluster of points
dbsc_clusters = pd.Series([fac_coords[fac_cluster_labels==n] for n in range(num_clusters)])
Clustered 1,397 points down to 93 clusters, for 93.3% compression in 0.06 seconds
Silhouette coefficient: -0.028
In [47]:
# get centroid of each cluster
fac_centroids = dbsc_clusters.map(get_centroid)
# unzip the list of centroid points (lat, lon) tuples into separate lat and lon lists
cent_lats, cent_lons = zip(*fac_centroids)
# from these lats/lons create a new df of one representative point for eac cluster
centroids_df = pd.DataFrame({'longitude':cent_lons, 'latitude':cent_lats})
In [48]:
# Plot the facility clusters and cluster centroid
fig, ax = plt.subplots(figsize=[20, 12])
facility_scatter = ax.scatter(facility_df['longitude'], facility_df['latitude'],c=fac_cluster_labels,
                              cmap = cm.Dark2, edgecolor='None', alpha=0.7, s=120)
centroid_scatter = ax.scatter(centroids_df['longitude'], centroids_df['latitude'], 
                              marker='x', linewidths=2, c='k', s=50)
ax.set_title('DBSCAN eps = 1.0, Facility Clusters & Facility Cluster Centroids', fontsize = 30)
ax.set_xlabel('Longitude', fontsize=24)
ax.set_ylabel('Latitude', fontsize=24)
ax.legend([facility_scatter, centroid_scatter], ['Facilities', 'Facility Cluster Centroid'],
          loc='upper right', fontsize = 20)
plt.show()

eps = 1.0. the DBSCAN algorithm divided the facilities into 93 clusters.

3.3 Clustering Facilities Using KMeans Algorithm

3.3.1 KMeans Clustering with 200 Clusters

In [49]:
n_clusters = 200
fac_coords = facility_df.as_matrix(columns=['latitude', 'longitude'])
KM200_model = KMeans(n_clusters = n_clusters)
KM200_model.fit(fac_coords)
Out[49]:
KMeans(copy_x=True, init='k-means++', max_iter=300, n_clusters=200, n_init=10,
    n_jobs=1, precompute_distances='auto', random_state=None, tol=0.0001,
    verbose=0)
In [50]:
# turn the clusters in to a pandas series, where each element is a cluster of points
KM200_clusters = pd.Series([fac_coords[KM200_model.labels_==n] for n in range(n_clusters )])

# centroids of 200 clusters
fac200_centroids = KM200_model.cluster_centers_

# Labels
KM200_label = KM200_model.labels_
In [51]:
# Plot acility clusters and cluster centroid
fig, ax = plt.subplots(figsize=[20, 12])
facility_scatter = ax.scatter(facility_df['longitude'],  facility_df['latitude'], c=KM200_label,
                              cmap = cm.Dark2, edgecolor='None', alpha=0.7, s=120)
centroid_scatter = ax.scatter(fac200_centroids[:,1], fac200_centroids[:,0], marker='x', linewidths=2, c='k', s=30)
ax.set_title('KMeans 200 Facility Clusters & Facility Cluster Centroids', fontsize = 30)
ax.set_xlabel('Longitude', fontsize=24)
ax.set_ylabel('Latitude', fontsize = 24)
ax.legend([facility_scatter, centroid_scatter], ['Facilities', 'Facility Cluster Centroid'], loc='upper right', fontsize = 20)
plt.show()
In [52]:
# Add cluster labels to facility_df dataframe
labeled_KM200_facility_df = facility_df
labeled_KM200_facility_df = facility_df
labeled_KM200_facility_df['label'] = KM200_label
labeled_KM200_facility_df.head()
Out[52]:
add_num lf_name address postal_cd city x y longitude latitude fac_name ste_flr_un performance exhbvisart scrn_based library multipurp heritage ownership objectID label
0 3465 Dundas St W 3465 Dundas St W M6S 2S5 Toronto 305795.585 4835942.551 -79.487461 43.665434 Club Hispano NaN 1 1 0 0 0 0 Owned by others 191 28
1 181 Glenlake Ave 181 Glenlake Ave M6P 4B6 Toronto 307834.850 4835127.684 -79.462181 43.658093 Keele Community Centre NaN 0 0 0 0 1 0 City Owned 194 143
2 288 Humberside Ave 288 Humberside Ave M6P 1L5 Toronto 307268.847 4835411.316 -79.469196 43.660649 St. Johns West Toronto Anglican Church NaN 1 0 0 0 1 0 Owned by others 195 53
3 238 Jane St 238 Jane St M6S 3Z1 Toronto 305831.038 4834780.388 -79.487021 43.654964 Art Works Art School NaN 0 1 0 0 0 0 Owned by others 196 196
4 95 Lavinia Ave 95 Lavinia Ave M6S 3H9 Toronto 306568.733 4833967.983 -79.477882 43.647658 Swansea Memorial Library NaN 0 0 0 1 0 0 City Owned 199 134
In [53]:
labeled_KM200_facility_df.groupby('label').count()[['add_num', 'address']].describe()
Out[53]:
add_num address
count 200.000000 200.000000
mean 6.985000 6.985000
std 7.296642 7.296642
min 1.000000 1.000000
25% 2.000000 2.000000
50% 4.000000 4.000000
75% 9.000000 9.000000
max 48.000000 48.000000

The smallest cluster has 1 facility, the largest clusters contains 46 facilities

3.3.2 KMeans Clustering with 300 Clusters

In [54]:
n_clusters = 300
fac_coords = facility_df.as_matrix(columns=['latitude', 'longitude'])
KM300_model = KMeans(n_clusters =n_clusters)
KM300_model.fit(fac_coords)
Out[54]:
KMeans(copy_x=True, init='k-means++', max_iter=300, n_clusters=300, n_init=10,
    n_jobs=1, precompute_distances='auto', random_state=None, tol=0.0001,
    verbose=0)
In [55]:
# turn the clusters in to a pandas series, where each element is a cluster of points
KM300_clusters = pd.Series([fac_coords[KM300_model.labels_==n] for n in range(n_clusters )])

# centroids of 200 clusters
fac300_centroids = KM300_model.cluster_centers_

# Labels
KM300_label = KM300_model.labels_
In [56]:
# Plot the facility clusters and cluster centroids
fig, ax = plt.subplots(figsize=[20, 12])
facility_scatter = ax.scatter(facility_df['longitude'],  facility_df['latitude'], c=KM300_label,
                              cmap = cm.Dark2, edgecolor='None', alpha=0.7, s=120)
centroid_scatter = ax.scatter(fac300_centroids[:,1], fac300_centroids[:,0], 
                              marker='x', linewidths=2, c='k', s=30)
ax.set_title('KMeans 300 Facility Clusters & Facility Cluster Centroids', fontsize = 30)
ax.set_xlabel('Longitude', fontsize=24)
ax.set_ylabel('Latitude', fontsize = 24)
ax.legend([facility_scatter, centroid_scatter],
          ['Facilities', 'Facility Cluster Centroid'],
          loc='upper right', fontsize = 20)
plt.show()
In [57]:
# Plot facilities and facility cluster centroids on interactive map
facility_centroid_map = folium.Map(location = [43.6532, -79.3832])
facility_centroid_map.save('facility_centroids.html')

# Add markers of facilities
facility_cluster = folium.MarkerCluster().add_to(facility_centroid_map)
for index, row in facility_df.iterrows():
    folium.Marker([row["latitude"],row["longitude"]] ).add_to(facility_cluster)    
    
# Add markers of clusters centroids
centroid_cluster = folium.MarkerCluster().add_to(facility_centroid_map)
for index, row in centroids_df.iterrows():
    folium.RegularPolygonMarker([row['latitude'], row['longitude']],
                 fill_color='#769d96', number_of_sides=8, radius=6, popup='cluster centroid').add_to(centroid_cluster)

facility_centroid_map.save('facility_centroids.html')
facility_centroid_map
Out[57]:
In [58]:
# Add cluster labels to facility_df dataframe
labeled_KM300_facility_df = facility_df
labeled_KM300_facility_df = facility_df
labeled_KM300_facility_df['label'] = KM300_label
labeled_KM300_facility_df.head()
Out[58]:
add_num lf_name address postal_cd city x y longitude latitude fac_name ste_flr_un performance exhbvisart scrn_based library multipurp heritage ownership objectID label
0 3465 Dundas St W 3465 Dundas St W M6S 2S5 Toronto 305795.585 4835942.551 -79.487461 43.665434 Club Hispano NaN 1 1 0 0 0 0 Owned by others 191 285
1 181 Glenlake Ave 181 Glenlake Ave M6P 4B6 Toronto 307834.850 4835127.684 -79.462181 43.658093 Keele Community Centre NaN 0 0 0 0 1 0 City Owned 194 239
2 288 Humberside Ave 288 Humberside Ave M6P 1L5 Toronto 307268.847 4835411.316 -79.469196 43.660649 St. Johns West Toronto Anglican Church NaN 1 0 0 0 1 0 Owned by others 195 164
3 238 Jane St 238 Jane St M6S 3Z1 Toronto 305831.038 4834780.388 -79.487021 43.654964 Art Works Art School NaN 0 1 0 0 0 0 Owned by others 196 274
4 95 Lavinia Ave 95 Lavinia Ave M6S 3H9 Toronto 306568.733 4833967.983 -79.477882 43.647658 Swansea Memorial Library NaN 0 0 0 1 0 0 City Owned 199 157
In [59]:
labeled_KM300_facility_df.groupby('label').count()[['add_num', 'address']].describe()
Out[59]:
add_num address
count 300.000000 300.000000
mean 4.656667 4.656667
std 4.698835 4.698835
min 1.000000 1.000000
25% 2.000000 2.000000
50% 3.000000 3.000000
75% 6.000000 6.000000
max 35.000000 35.000000

The smallest cluster has 1 facility, the largest cluster has 43 facility

4. Map Placement Based on Facility Clustering

Now we have 300 facilities clusters and their centroids, our goal is to find an appropriate intersection to place map for each facility cluster as shown in the following plot.

We compared two cases:
Case 1: Select the intersection that is closest to the centroid of a cluster
Case 2: Select the 3 closest intersections to a the centroid of a cluster, then among the five the closest intersections, we select the one with the highest pedestrian volume.

KDTree Algorithm
To find the closest intersection to a cluster centroid, we will use KDTree algorithm.

In [60]:
# plot facility cluster centroids and intersections
fig, ax = plt.subplots(figsize=[20, 12])
intersection_scatter = ax.scatter(intersection_ped_df['longitude'],
                                  intersection_ped_df['latitude'],
                                  c='b', edgecolor='None', alpha=0.9, s=12)
KM300_centroid_scatter = ax.scatter(fac300_centroids[:,1], fac300_centroids[:,0], 
                                    linewidths=2, c='k', s=40)
ax.set_title('300 Facility Clusters Centroids & 2256 Intersections', fontsize = 30)
ax.set_xlabel('Longitude', fontsize=20)
ax.set_ylabel('Latitude', fontsize =20)
ax.legend([intersection_scatter, KM300_centroid_scatter], 
          ['Intersections', 'Facilities Cluster Centroids'], loc='upper right', fontsize = 20)
plt.show()

The small blue dots represent intersections, the big black dots represent facility cluster centroids. Our goal now is to find the closet intersection for each cluster centroid.

4.1: Only Consider Distance

In [61]:
intersection_coords = intersection_ped_df.as_matrix(columns=['latitude', 'longitude'])

# store index of cloest intersection to each cluster centroid
closest_intersection_index =[] 

# find the closest points using scipy KDTree algorithm
for i in range(len(fac300_centroids)):
    distance, index = spatial.KDTree(intersection_coords).query(fac300_centroids[i])
    closest_intersection_index.append(index)

# have a look at the index    
closest_intersection_index  
Out[61]:
[766,
 1389,
 1448,
 272,
 1055,
 1783,
 55,
 429,
 1282,
 235,
 1952,
 1673,
 1300,
 680,
 338,
 1761,
 1567,
 587,
 1897,
 124,
 1032,
 2144,
 913,
 2080,
 157,
 329,
 1432,
 1733,
 1775,
 892,
 340,
 2192,
 554,
 436,
 325,
 1382,
 2026,
 1806,
 510,
 1346,
 1054,
 193,
 14,
 106,
 670,
 2156,
 870,
 772,
 1860,
 186,
 840,
 663,
 355,
 269,
 93,
 1007,
 2104,
 1643,
 645,
 1854,
 146,
 576,
 1805,
 1634,
 541,
 1819,
 954,
 1392,
 520,
 2170,
 2147,
 2097,
 1815,
 1192,
 1391,
 296,
 1102,
 36,
 2024,
 722,
 994,
 2089,
 2199,
 1758,
 652,
 1885,
 180,
 828,
 2058,
 756,
 1686,
 957,
 1712,
 1076,
 136,
 110,
 381,
 760,
 1821,
 319,
 553,
 1293,
 2127,
 1762,
 2236,
 1920,
 1963,
 349,
 2202,
 147,
 964,
 1918,
 1809,
 1071,
 995,
 2135,
 431,
 403,
 1387,
 361,
 188,
 1913,
 1296,
 121,
 2044,
 585,
 770,
 2161,
 2167,
 980,
 1132,
 513,
 1668,
 1451,
 666,
 343,
 480,
 1901,
 853,
 626,
 1880,
 1710,
 263,
 1943,
 1562,
 2205,
 1191,
 1528,
 1645,
 1084,
 2168,
 238,
 737,
 548,
 372,
 2208,
 2171,
 2196,
 2062,
 1216,
 991,
 195,
 729,
 177,
 2141,
 930,
 830,
 367,
 1445,
 952,
 1973,
 960,
 965,
 1008,
 1118,
 860,
 1092,
 1974,
 1181,
 2039,
 1469,
 551,
 402,
 409,
 154,
 502,
 1279,
 1766,
 704,
 1196,
 972,
 1978,
 1507,
 544,
 466,
 165,
 1087,
 834,
 642,
 1115,
 2128,
 415,
 1954,
 1871,
 2151,
 1406,
 1099,
 549,
 557,
 1233,
 1513,
 459,
 1932,
 799,
 1592,
 1204,
 848,
 1701,
 1097,
 2089,
 23,
 1091,
 1607,
 345,
 1861,
 1931,
 478,
 2247,
 1155,
 2021,
 1425,
 2093,
 686,
 170,
 1868,
 792,
 27,
 298,
 841,
 572,
 6,
 914,
 1696,
 322,
 2055,
 1342,
 1822,
 1021,
 406,
 2175,
 53,
 701,
 1291,
 2031,
 562,
 262,
 43,
 1316,
 707,
 1786,
 1807,
 1114,
 1322,
 540,
 2037,
 896,
 1878,
 225,
 288,
 1625,
 1845,
 919,
 1882,
 849,
 331,
 2040,
 1023,
 320,
 1838,
 309,
 2238,
 47,
 398,
 1977,
 2149,
 183,
 1939,
 1129,
 1067,
 2222,
 278,
 464,
 365,
 1486,
 621,
 2187,
 1421,
 1076,
 1902,
 631]
In [62]:
len(closest_intersection_index)
Out[62]:
300
In [63]:
# one intersection may be the closet to multiple cluster centroids
# especially in downtown where facilities are condensed.
len(set(closest_intersection_index))
Out[63]:
298
In [64]:
map_intersection_index = list(set(closest_intersection_index))
len(map_intersection_index)
Out[64]:
298
In [65]:
# selected intersections for map placement
map_intersection_df = intersection_ped_df.iloc[map_intersection_index]
map_intersection_df
Out[65]:
PX main midblock_route side1_route side2_route latitude longitude 8hr_vel_vol 8hr_ped_vol
1807 1825 EGLINTON AVE E NaN TORRANCE RD PRIVATE ACCESS 43.739600 -79.235250 1102.000000 20519.000000
513 515 VICTORIA PARK AVE NaN SURREY AVE NaN 43.736300 -79.307390 12052.000000 9270.500000
1451 1457 BAY ST NaN ST JOSEPH ST NaN 43.665650 -79.387700 8042.500000 8613.000000
6 8 JARVIS ST NaN DUNDAS ST E NaN 43.657052 -79.374531 11929.000000 13787.000000
2055 2108 LAWRENCE AVE W NaN BROOKHAVEN DR NaN 43.704370 -79.496650 6415.000000 5882.000000
1032 1035 FINCH AVE E NaN WARDEN AVE NaN 43.796830 -79.315760 16495.000000 16851.500000
2058 2111 LAWRENCE AVE E NaN BAYVIEW AVE NB RAMP CHEDDINGTON PL 43.727429 -79.381836 4135.500000 4841.000000
631 634 DON MILLS RD NaN DUNCAN MILL RD GRAYDON HALL DR 43.761755 -79.347221 1294.000000 23590.000000
770 773 ROYAL YORK RD NaN MIMICO AVE NaN 43.612260 -79.496720 3616.000000 3547.000000
14 16 KING ST E NaN CHURCH ST NaN 43.649916 -79.374409 11277.000000 13419.000000
1978 2024 ST CLAIR AVE W NaN MOULD AVE PRIVATE ACCESS 43.668240 -79.487040 472.000000 10677.000000
686 689 WARDEN AVE NaN FIRVALLEY CRT NaN 43.703690 -79.277480 4450.000000 4072.500000
154 156 KINGSTON RD NaN WARDEN AVE NaN 43.687770 -79.270928 6854.500000 7618.000000
23 25 BLOOR ST E NaN CHURCH ST NaN 43.671022 -79.383123 20764.000000 23480.000000
225 227 QUEENS PARK NaN QUEENS PARK CRES W NaN 43.666059 -79.393178 576.000000 19428.000000
1562 1568 SHEPPARD AVE E NaN MALVERN ST PROGRESS AVE 43.794680 -79.234910 604.000000 16692.000000
1181 1184 MORNINGSIDE AVE NaN CORONATION DR NaN 43.761906 -79.183615 3682.000000 3684.500000
540 542 DUNDAS ST W NaN DENISON AVE NaN 43.652040 -79.402270 4598.000000 9481.000000
541 543 QUEEN ST E NaN BROADVIEW AVE NaN 43.658890 -79.349820 7268.500000 7913.500000
1054 1057 KENNEDY RD NaN BERTRAND AVE NaN 43.737470 -79.270110 7426.500000 8091.500000
1567 1573 KINGSTON RD NaN LAWSON RD OLD KINGSTON RD 43.784540 -79.168720 122.000000 7477.000000
2080 2134 DUFFERIN ST NaN SASKATCHEWAN RD NaN 43.632631 -79.425127 287.000000 7822.000000
36 38 YONGE ST NaN CARLTON ST COLLEGE ST 43.661369 -79.383094 21276.500000 25370.000000
549 551 QUEEN ST E NaN BEECH AVE NaN 43.672690 -79.287560 2129.000000 9048.000000
551 553 QUEEN ST W NaN NIAGARA ST NaN 43.646020 -79.409930 1231.000000 7820.000000
2089 2143 CHERRY ST NaN COMMISSIONERS ST NaN 43.645257 -79.352682 517.000000 5622.000000
554 556 QUEEN ST W NaN DOVERCOURT RD NaN 43.643510 -79.422400 5683.000000 5695.000000
43 45 YONGE ST NaN SHAFTESBURY AVE ALCORN AVE 43.682023 -79.391574 8946.500000 9945.000000
557 560 THE QUEENSWAY NaN COLBORNE LODGE DR NaN 43.639530 -79.459480 642.000000 14367.000000
349 351 DANFORTH AVE NaN WOODBINE AVE NaN 43.685671 -79.312749 16523.666667 11939.000000
... ... ... ... ... ... ... ... ... ...
972 975 BURNHAMTHORPE RD NaN OLD BURNHAMTHORPE RD SATURN RD 43.642600 -79.575920 537.000000 16688.000000
1954 1994 LAKE SHORE BLVD W NaN LEGION RD PRIVATE ACCESS 43.620530 -79.483260 6890.666667 3760.666667
1486 1492 STEELES AVE E NaN MCCOWAN RD NaN 43.831070 -79.275160 14241.000000 16488.000000
464 466 EGLINTON AVE W NaN RICHARDSON AVE NaN 43.690970 -79.471490 7435.000000 7079.500000
2040 2092 SHEPPARD AVE E NaN HERONS HILL WAY NaN 43.775533 -79.335108 268.000000 21184.000000
466 468 EGLINTON AVE W NaN WESTON RD NaN 43.686980 -79.489510 11947.000000 11831.500000
980 983 LAWRENCE AVE E NaN MANSE RD NaN 43.770628 -79.176258 5794.000000 5590.500000
544 546 QUEEN ST E NaN JONES AVE NaN 43.662690 -79.332790 1549.000000 8225.000000
2127 2193 THE DONWAY W NaN OVERLAND DR PRIVATE ACCESS 43.732959 -79.346926 1231.000000 6333.000000
478 480 ST CLAIR AVE W NaN BATHURST ST NaN 43.683140 -79.418280 17377.333333 13834.666667
991 994 VICTORIA PARK AVE NaN VAN HORNE AVE FERNCREST GT 43.789270 -79.329570 674.000000 16025.000000
480 482 YONGE ST NaN BRIAR HILL AVE SHERWOOD AVE 43.713560 -79.399800 5249.000000 16438.000000
994 997 ELLESMERE RD NaN BOROUGH APPROACH E NaN 43.771300 -79.255660 7960.500000 7392.000000
995 998 FINCH AVE W NaN GOLDFINCH CRT TORRESDALE AVE 43.772330 -79.450930 9805.000000 10297.500000
2021 2071 MARTIN GROVE RD NaN VULCAN ST PRIVATE ACCESS 43.705260 -79.577140 29.000000 10722.000000
849 852 SPADINA RD NaN LONSDALE RD NaN 43.688690 -79.412650 4816.000000 8695.000000
2024 2075 BATHURST ST NaN DOUGLAS AVE COVINGTON RD 43.720880 -79.430140 10019.500000 10367.000000
1513 1519 MARTIN GROVE RD NaN SILVERSTONE DR NaN 43.750920 -79.598590 796.000000 9513.000000
2026 2078 THE POND RD NaN SENTINEL RD PRIVATE ACCESS 43.770020 -79.502480 9454.000000 11330.000000
1007 1010 SHEPPARD AVE W NaN ARLETA AVE NORTHOVER ST 43.741570 -79.501490 9638.000000 9336.500000
1008 1011 EGLINTON AVE W NaN BICKNELL AVE MUNICIPAL DR 43.689690 -79.477170 9071.000000 10760.500000
680 683 MILLWOOD RD NaN MCRAE DR NaN 43.704803 -79.368108 4178.000000 4737.500000
2037 2089 LAKE SHORE BLVD W 25m EAST OF BUDAPEST LANE NaN 43.637040 -79.447530 236.000000 14539.000000
502 504 AVENUE RD NaN WOBURN AVE NaN 43.725400 -79.416870 1033.000000 19834.000000
2039 2091 MIDLAND AVE 175m NORTH OF MCNICOLL AVE PRIVATE ACCESS 43.816080 -79.292880 85.000000 10597.000000
1528 1534 WILMINGTON AVE NaN OVERBROOK PL NaN 43.762850 -79.456900 1028.000000 8680.000000
2044 2097 DUPONT ST NaN EDWIN AVE NaN 43.664910 -79.455490 7784.333333 4497.000000
1021 1024 ST CLAIR AVE W NaN ARLINGTON AVE NaN 43.681100 -79.428850 6006.000000 6369.500000
510 512 VICTORIA PARK AVE NaN CASSANDRA BLVD NaN 43.754340 -79.314000 10563.500000 10763.500000
1023 1026 BIRCHMOUNT RD 350m NORTH OF BONIS AVE PRIVATE ACCESS 43.786620 -79.300660 230.000000 14755.000000

298 rows × 9 columns

In [66]:
# Check the pedestrian volume of the select 296 intersections
map_intersection_df['8hr_ped_vol'].describe()
Out[66]:
count      298.000000
mean     11324.208110
std       6014.470753
min         23.000000
25%       7146.750000
50%      10124.000000
75%      14770.000000
max      39750.000000
Name: 8hr_ped_vol, dtype: float64

The average 8 hour pedestrian volume is 11246.8 for the select 296 intersections

In [67]:
# Plot the selected 296 Intersections for Map Placement
fig, ax = plt.subplots(figsize=[20, 12])
intersection_scatter = ax.scatter(intersection_ped_df['longitude'],
                                  intersection_ped_df['latitude'],
                                  c='b', edgecolor='None', alpha=0.9, s=8)
facility_scatter = ax.scatter(facility_df['longitude'],  facility_df['latitude'], c=KM300_label,
                              cmap = cm.Dark2, edgecolor='None', alpha=0.7, s=120)
map_intersection_scatter = ax.scatter(map_intersection_df['longitude'],
                                      map_intersection_df['latitude'], 
                                      marker='x', linewidths=2, c='k', s=30)
ax.set_title('300 Facility Clusters & 296 Closest Intersections for Map Placment', fontsize = 30)
ax.set_xlabel('Longitude', fontsize=24)
ax.set_ylabel('Latitude', fontsize = 24)
ax.legend([intersection_scatter, facility_scatter, map_intersection_scatter], 
          ['Intersections', 'Facilities', 'Selected Intersections for Map Placement'], 
          loc='lower right', fontsize = 20)
plt.show()

4.2 Consider Both Distance and Pedestrian Volume

In [68]:
# store index of cloest 3 intersections to each cluster centroid
cloest_intersection_index2 =[] 

# Find the closest 3 intersections using KDTree
for i in range(len(fac300_centroids)):
    distance, index = spatial.KDTree(intersection_coords).query(fac300_centroids[i], 3)
    cloest_intersection_index2.append(index)

# The returned index is a list of array
cloest_intersection_index2
Out[68]:
[array([ 766, 1700,  486]),
 array([1389, 1262,  283]),
 array([1448, 1180, 1064]),
 array([272, 273, 271]),
 array([1055,  144, 1190]),
 array([1783, 2085, 2083]),
 array([55, 56, 54]),
 array([429, 578, 579]),
 array([1282,  471,  638]),
 array([235, 764, 236]),
 array([1952,  258,  257]),
 array([1673, 1548, 1576]),
 array([1300, 1685,  832]),
 array([680, 681, 169]),
 array([338, 336, 335]),
 array([1761, 1760, 2075]),
 array([1567, 1675, 2247]),
 array([587, 392, 961]),
 array([1897, 1288,   44]),
 array([ 124, 1787,  125]),
 array([1032, 1215, 2001]),
 array([2144,  880, 1013]),
 array([ 913, 1350,  623]),
 array([2080, 1953, 1443]),
 array([ 157, 1658, 1900]),
 array([ 329, 1676, 1473]),
 array([1432,  859, 1798]),
 array([1733, 1841,  813]),
 array([1775, 1688, 1378]),
 array([ 892, 2228, 1970]),
 array([340, 341, 475]),
 array([2192,  595,  854]),
 array([ 554, 2029, 1328]),
 array([436,  98, 503]),
 array([ 325, 1302, 1840]),
 array([1382, 1121,  453]),
 array([2026, 1996, 2077]),
 array([1806,  162, 1022]),
 array([ 510, 2011,  515]),
 array([1346, 1062, 1475]),
 array([1054, 1749,  457]),
 array([ 193,  950, 1157]),
 array([14, 13, 15]),
 array([106, 107, 379]),
 array([ 670,  976, 1165]),
 array([2156, 1079,  898]),
 array([ 870,  424, 1695]),
 array([ 772, 1311, 1335]),
 array([1860, 1251, 1564]),
 array([ 186,  573, 1856]),
 array([840, 846, 604]),
 array([ 663, 1731,  658]),
 array([355, 356, 416]),
 array([ 269, 1397,  835]),
 array([  93,   94, 1305]),
 array([1007, 2038, 1423]),
 array([2104, 1484, 1639]),
 array([1643,  483,  383]),
 array([ 645, 1144,  644]),
 array([1854,  194, 1271]),
 array([ 146, 2124,  461]),
 array([ 576, 1123, 2140]),
 array([1805,  640,  641]),
 array([1634,   71,   70]),
 array([541, 289, 814]),
 array([1819, 1875, 1429]),
 array([954, 699, 875]),
 array([1392, 1580, 1800]),
 array([ 520,  521, 1313]),
 array([2170, 1339, 1367]),
 array([2147, 1275, 1772]),
 array([2097, 1500, 1968]),
 array([1815,  677,  676]),
 array([1192,  716, 1962]),
 array([1391,  561, 1724]),
 array([ 296,  294, 1859]),
 array([1102, 1034, 1103]),
 array([ 36,  66, 864]),
 array([2024,  421, 1818]),
 array([ 722,  721, 1320]),
 array([ 994,  695, 1808]),
 array([2089,  817,  207]),
 array([2199, 1784, 2200]),
 array([1758,  774,  703]),
 array([ 652, 1128,  469]),
 array([1885,  547, 2008]),
 array([ 180,  178, 1904]),
 array([ 828,  179, 2227]),
 array([2058, 1914, 1640]),
 array([ 756,  755, 2054]),
 array([1686,  368,  369]),
 array([ 957,  808, 1506]),
 array([1712,  277,  276]),
 array([1076, 1295, 1294]),
 array([ 136, 1362, 1006]),
 array([110, 109, 600]),
 array([ 381, 2034, 1020]),
 array([760, 759, 890]),
 array([1821, 2145,  945]),
 array([ 319, 1671, 1235]),
 array([ 553,  552, 1936]),
 array([1293, 1213, 1046]),
 array([2127,  399, 2177]),
 array([1762, 2049, 1680]),
 array([2236,  966,  378]),
 array([1920, 2209, 1998]),
 array([1963,  232,  767]),
 array([349, 363, 350]),
 array([2202, 2203, 2201]),
 array([ 147, 2159,  148]),
 array([ 964, 1714, 1210]),
 array([1918,  292,  475]),
 array([1809, 2108, 1568]),
 array([1071,  102,  101]),
 array([ 995, 1218,  672]),
 array([2135, 1174, 1368]),
 array([431, 581, 777]),
 array([403, 888, 357]),
 array([1387,  259, 1569]),
 array([ 361,  351, 1865]),
 array([ 188, 2069, 2188]),
 array([1913, 2006, 1159]),
 array([1296, 1047,  949]),
 array([ 121, 1755, 1725]),
 array([2044,  842, 1543]),
 array([ 585,  634, 1125]),
 array([ 770,  230, 2082]),
 array([2161, 1285,  395]),
 array([2167, 1623, 1110]),
 array([ 980, 1185, 1163]),
 array([1132, 1136, 1435]),
 array([ 513, 1579,  512]),
 array([1668,  445, 1707]),
 array([1451,   67, 1412]),
 array([ 666, 1960,  661]),
 array([343, 342, 344]),
 array([480,  49, 851]),
 array([1901,  323, 1060]),
 array([ 853,  128, 1377]),
 array([ 626, 1043, 1853]),
 array([1880, 2154,  678]),
 array([1710,  122,  506]),
 array([263, 264,  79]),
 array([1943, 1327,  538]),
 array([1562, 1776, 1711]),
 array([2205,  202, 2204]),
 array([1191,  668,  669]),
 array([1528,  629,  736]),
 array([1645,  427,  426]),
 array([1084,  488,  489]),
 array([2168, 1551,  803]),
 array([238, 237, 239]),
 array([ 737, 1166, 1770]),
 array([ 548,  547, 1979]),
 array([ 372, 2229,  290]),
 array([2208, 2121, 1587]),
 array([2171, 1869,  207]),
 array([2196, 1473,  329]),
 array([2062, 2166, 1427]),
 array([1216, 1429, 1383]),
 array([ 991, 1248, 1012]),
 array([ 195,  900, 1386]),
 array([ 729, 1208, 2139]),
 array([ 177, 1122,   88]),
 array([2141,  185,  843]),
 array([ 930,  958, 1169]),
 array([ 830,  829, 1902]),
 array([367, 282, 283]),
 array([1445, 1343, 1470]),
 array([ 952, 1599,  432]),
 array([1973, 1474,   22]),
 array([ 960, 1810, 1796]),
 array([ 965, 1921,  105]),
 array([1008,  467,  465]),
 array([1118,   97, 1735]),
 array([ 860, 1440, 2250]),
 array([1092,  807,  806]),
 array([1974, 1613,  494]),
 array([1181, 2114, 2112]),
 array([2039, 1559, 2002]),
 array([1469,  348,  347]),
 array([ 551, 1329,  535]),
 array([402, 404, 929]),
 array([409, 408, 410]),
 array([ 154,  358, 2051]),
 array([502, 118, 117]),
 array([1279,  992,  412]),
 array([1766, 1338, 1790]),
 array([ 704,  705, 1752]),
 array([1196, 1702, 1622]),
 array([ 972,  871, 1863]),
 array([1978,  492, 1363]),
 array([1507, 1046, 2063]),
 array([544, 285, 545]),
 array([466, 576, 575]),
 array([165, 167, 168]),
 array([1087, 1263,  265]),
 array([ 834,  836, 1958]),
 array([ 642, 1080,  643]),
 array([1115,  647, 1767]),
 array([2128, 1375,  437]),
 array([ 415, 1971, 2185]),
 array([1954,  986,  228]),
 array([1871, 1177, 1560]),
 array([2151, 1029, 1109]),
 array([1406,   45,   46]),
 array([1099, 1890,  714]),
 array([ 549, 1979, 1658]),
 array([ 557, 1450, 1792]),
 array([1233, 1656,  614]),
 array([1513, 2130, 1529]),
 array([ 459, 2018, 1650]),
 array([1932, 1930, 1934]),
 array([ 799, 1522,  796]),
 array([1592, 1721, 1660]),
 array([1204,  619, 1158]),
 array([ 848,  841, 1059]),
 array([1701,  500,  143]),
 array([1097, 1738,  123]),
 array([2089,  817, 1553]),
 array([ 23,  12, 223]),
 array([1091,  802,  801]),
 array([1607,  450,  448]),
 array([345, 346, 344]),
 array([1861,  905,  702]),
 array([1931, 1825, 1929]),
 array([478, 304, 482]),
 array([2247, 1567, 1465]),
 array([1155, 1001, 1154]),
 array([2021, 1083,  786]),
 array([1425, 2150, 1249]),
 array([2093,  809, 1203]),
 array([ 686, 1820, 1846]),
 array([170, 169,  88]),
 array([1868,  567, 1324]),
 array([ 792,  584, 1063]),
 array([27, 28, 57]),
 array([ 298, 2180, 1151]),
 array([841, 387, 388]),
 array([572, 327, 326]),
 array([   6, 2033,    5]),
 array([ 914,  920, 1814]),
 array([1696, 1763, 1549]),
 array([ 322,  339, 1896]),
 array([2055, 1959, 1345]),
 array([1342, 1836, 2169]),
 array([1822,  206,  205]),
 array([1021,  483,  484]),
 array([ 406,  405, 1184]),
 array([2175,  295,  199]),
 array([53, 52, 54]),
 array([ 701, 1393,  418]),
 array([1291,  354, 2134]),
 array([2031,  507,  279]),
 array([ 562,  563, 1460]),
 array([ 262,  261, 1413]),
 array([  43, 1287, 1911]),
 array([1316,  591,  592]),
 array([707, 706, 708]),
 array([1786,  676, 1815]),
 array([1807,  190,  189]),
 array([1114,  290,  541]),
 array([1322, 1309,  234]),
 array([540, 276, 823]),
 array([2037,  556, 1855]),
 array([896, 673, 674]),
 array([1878,   34,   19]),
 array([225,  84,  83]),
 array([288, 542, 287]),
 array([1625, 1525, 1199]),
 array([1845,  630, 1655]),
 array([919, 248, 249]),
 array([1882, 1831, 2029]),
 array([ 849, 1372,  477]),
 array([ 331,  332, 1313]),
 array([2040,  903,  182]),
 array([1023, 1057,  984]),
 array([ 320, 1070, 1895]),
 array([1838,  865, 1230]),
 array([309, 310, 308]),
 array([2238,  679,  682]),
 array([  47, 1374, 1411]),
 array([ 398, 2212,  733]),
 array([1977, 1793,  550]),
 array([2149,  872,  871]),
 array([183, 493, 521]),
 array([1939,  618, 1385]),
 array([1129,  989, 1925]),
 array([1067,  740, 1922]),
 array([2222, 1606,  598]),
 array([ 278, 1748, 1747]),
 array([ 464, 1557,  465]),
 array([ 365,  364, 1352]),
 array([1486, 1533, 1699]),
 array([621, 401, 400]),
 array([2187, 1245,  531]),
 array([1421, 1326, 1328]),
 array([1076, 1295,  734]),
 array([1902, 1059,  637]),
 array([ 631,  894, 1350])]
In [69]:
len(cloest_intersection_index2)
Out[69]:
300
In [70]:
# create an empty dataframe with same schema as the intersection_ped_df
map_intersection_df2 = intersection_ped_df[intersection_ped_df['PX']<0]
map_intersection_df2
Out[70]:
PX main midblock_route side1_route side2_route latitude longitude 8hr_vel_vol 8hr_ped_vol
In [71]:
print map_intersection_df2.columns
print map_intersection_df2.dtypes
print map_intersection_df2.empty
Index([u'PX', u'main', u'midblock_route', u'side1_route', u'side2_route', u'latitude', u'longitude', u'8hr_vel_vol', u'8hr_ped_vol'], dtype='object')
PX                  int64
main               object
midblock_route     object
side1_route        object
side2_route        object
latitude          float64
longitude         float64
8hr_vel_vol       float64
8hr_ped_vol       float64
dtype: object
True
In [72]:
for ary in cloest_intersection_index2:
    # five cloest intersection to a facility centroid
    temp_df = intersection_ped_df.iloc[ary]  
    # select the max pedestrian volume intersection
    # among the 3 cloest intersections
    max_ped_df = temp_df.ix[temp_df['8hr_ped_vol'].idxmax()]
    
    # appending the selected intersection row to map_intersection_df2
    map_intersection_df2 = map_intersection_df2.append(max_ped_df)

# 300 intersections, duplicates may exist
map_intersection_df2
Out[72]:
PX main midblock_route side1_route side2_route latitude longitude 8hr_vel_vol 8hr_ped_vol
486 488 DUFFERIN ST NaN ST CLAIR AVE W NaN 43.677950 -79.443060 14866.000000 10850.000000
1262 1265 EASTERN AVE NaN KNOX AVE NaN 43.662540 -79.325230 404.000000 10111.000000
1180 1183 FINCH AVE E NaN AU LARGE BLVD SENECA HILL DR 43.793650 -79.350420 9709.000000 27994.000000
271 273 SPADINA AVE NaN KING ST W NaN 43.645451 -79.395002 18498.000000 20677.000000
144 146 KINGSTON RD NaN GUILDWOOD PKWY CROMWELL RD 43.749649 -79.205992 258.000000 24763.000000
1783 1797 KIPLING AVE NaN GENTHORN AVE PRIVATE ACCESS 43.723470 -79.572330 4689.500000 6409.500000
54 56 YONGE ST NaN RANLEIGH AVE NaN 43.727864 -79.402930 5040.000000 18308.000000
429 431 WESTON RD NaN LAWRENCE AVE W NaN 43.700140 -79.516240 11069.500000 11656.000000
638 641 ST CLAIR AVE E NaN HERRON AVE PRIVATE ACCESS 43.710968 -79.285930 1010.000000 15404.000000
235 237 LAKE SHORE BLVD W NaN KIPLING AVE COLONEL SAMUEL SMITH PARK DR 43.598158 -79.516936 11514.333333 6920.666667
257 259 SHERBOURNE ST NaN GERRARD ST E NaN 43.661228 -79.372218 2040.000000 15254.000000
1548 1554 STEELES AVE E NaN MIDDLEFIELD RD NaN 43.832740 -79.267940 711.000000 24186.000000
832 835 HARBORD ST NaN GRACE ST NaN 43.659960 -79.417470 1833.000000 9549.000000
169 171 BAYVIEW AVE NaN MILLWOOD RD NaN 43.704258 -79.374587 2599.000000 18168.000000
335 337 BLOOR ST W NaN PRINCE EDWARD DR N PRINCE EDWARD DR S 43.648666 -79.505433 1187.000000 21167.000000
1760 1773 ALNESS ST NaN MARTIN ROSS AVE NaN 43.774280 -79.473370 76.000000 10105.000000
1567 1573 KINGSTON RD NaN LAWSON RD OLD KINGSTON RD 43.784540 -79.168720 122.000000 7477.000000
587 590 KEELE ST NaN DONALD AVE NaN 43.684930 -79.472780 1070.000000 10579.000000
44 46 YONGE ST NaN ST CLAIR AVE E ST CLAIR AVE W 43.688075 -79.394098 24060.750000 18858.250000
124 126 YONGE ST NaN EMPRESS AVE PARK HOME AVE 43.769295 -79.412931 12257.000000 17199.500000
1215 1218 FINCH AVE E NaN BRIDLETOWNE CRCL NaN 43.797550 -79.312320 1305.000000 22412.000000
880 883 DUNDAS ST W NaN NEILSON DR WEST MALL CRES 43.627380 -79.562370 15983.500000 14006.000000
623 626 DON MILLS RD NaN YORK MILLS RD NaN 43.755793 -79.347046 4197.000000 40868.000000
2080 2134 DUFFERIN ST NaN SASKATCHEWAN RD NaN 43.632631 -79.425127 287.000000 7822.000000
1658 1669 KINGSTON RD NaN SCARBOROUGH RD NaN 43.680560 -79.287050 1624.000000 15979.000000
329 331 BLOOR ST W NaN RUNNYMEDE RD NaN 43.651148 -79.476227 11003.000000 15025.000000
859 862 SHEPPARD AVE E NaN MIDLAND AVE NaN 43.785370 -79.278550 18209.000000 16636.500000
813 816 EASTERN AVE NaN CHERRY ST SUMACH ST 43.653900 -79.358500 143.000000 12774.000000
1378 1382 NEILSON RD NaN SEWELLS RD TAPSCOTT RD 43.806570 -79.218650 1439.000000 14304.000000
1970 2011 YONGE ST 35m SOUTH OF YORKVILLE AVE NaN 43.671630 -79.387340 746.000000 12272.000000
... ... ... ... ... ... ... ... ... ...
1845 1869 KEELE ST NaN DOVEHOUSE AVE NaN 43.749610 -79.487580 613.000000 15363.000000
248 250 PARLIAMENT ST NaN CARLTON ST NaN 43.664269 -79.367958 5163.000000 9877.000000
2029 2081 KING ST W NaN JOE SHUSTER WAY NaN 43.639730 -79.423580 847.000000 8970.000000
477 479 ST CLAIR AVE W NaN SPADINA RD NaN 43.684620 -79.411160 2028.000000 17770.000000
331 333 BLOOR ST W NaN JANE ST NaN 43.649331 -79.484452 10494.500000 10772.500000
2040 2092 SHEPPARD AVE E NaN HERONS HILL WAY NaN 43.775533 -79.335108 268.000000 21184.000000
984 987 BIRCHMOUNT RD NaN HUNTINGWOOD DR NaN 43.790450 -79.302270 1085.000000 20516.000000
320 322 BLOOR ST W NaN CHRISTIE ST GRACE ST 43.663460 -79.418837 7212.000000 14556.000000
1838 1861 FINCH AVE E NaN LISZT GT TRUDY RD 43.790840 -79.363060 938.000000 19310.000000
310 312 BATHURST ST NaN GLENCAIRN AVE NaN 43.712327 -79.428131 8465.500000 10850.500000
679 682 LAIRD DR NaN MILLWOOD RD NaN 43.703231 -79.360591 157.000000 21057.000000
47 49 YONGE ST NaN EGLINTON AVE E EGLINTON AVE W 43.706750 -79.398311 32728.000000 23270.000000
733 736 LESLIE ST NaN TALWOOD DR NaN 43.740880 -79.357450 248.000000 16088.000000
550 552 QUEEN ST W NaN AUGUSTA AVE NaN 43.648060 -79.399660 6250.000000 7737.000000
871 874 BURNHAMTHORPE RD NaN RENFORTH DR NaN 43.643357 -79.572448 396.000000 22819.000000
493 495 JANE ST NaN ST CLAIR AVE W NaN 43.667210 -79.491740 633.000000 24579.000000
618 621 DON MILLS RD NaN ST DENNIS DR NaN 43.716546 -79.337275 872.000000 25442.000000
989 992 FINCH AVE W NaN MARTIN GROVE RD NaN 43.736810 -79.591700 1699.000000 19119.000000
1067 1070 FINCH AVE E NaN HEATHVIEW AVE TOLLERTON AVE 43.786270 -79.386630 632.000000 19474.000000
598 601 FINCH AVE W NaN KEELE ST NaN 43.763481 -79.490976 3608.000000 33097.000000
278 280 SPADINA AVE NaN HARBORD ST NaN 43.663065 -79.402104 13012.000000 14213.000000
465 467 EGLINTON AVE W NaN KEELE ST TRETHEWEY DR 43.690170 -79.475010 20953.000000 11985.666667
365 367 COXWELL AVE NaN FAIRFORD AVE GERRARD ST E 43.675329 -79.320299 1822.000000 11606.000000
1486 1492 STEELES AVE E NaN MCCOWAN RD NaN 43.831070 -79.275160 14241.000000 16488.000000
400 402 LAWRENCE AVE E NaN DON MILLS RD NaN 43.737159 -79.343417 2471.000000 30555.000000
531 533 SHEPPARD AVE W NaN JANE ST NaN 43.739420 -79.513190 15739.333333 9085.666667
1326 1329 QUEEN ST W NaN DUFFERIN ST NaN 43.642220 -79.428630 9125.000000 8678.000000
734 737 LESLIE ST NaN SOUTHWELL DR DENLOW BLVD 43.746000 -79.358500 387.000000 18403.000000
1059 1062 BLOOR ST W NaN SYMINGTON AVE STERLING RD 43.657300 -79.447900 8054.333333 5977.666667
631 634 DON MILLS RD NaN DUNCAN MILL RD GRAYDON HALL DR 43.761755 -79.347221 1294.000000 23590.000000

300 rows × 9 columns

In [73]:
# drop duplicates, and 289 unique intersections were selected 
# for map placement for the 300 facility clusters
map_intersection_df2 = map_intersection_df2.drop_duplicates()
map_intersection_df2
Out[73]:
PX main midblock_route side1_route side2_route latitude longitude 8hr_vel_vol 8hr_ped_vol
486 488 DUFFERIN ST NaN ST CLAIR AVE W NaN 43.677950 -79.443060 14866.000000 10850.000000
1262 1265 EASTERN AVE NaN KNOX AVE NaN 43.662540 -79.325230 404.000000 10111.000000
1180 1183 FINCH AVE E NaN AU LARGE BLVD SENECA HILL DR 43.793650 -79.350420 9709.000000 27994.000000
271 273 SPADINA AVE NaN KING ST W NaN 43.645451 -79.395002 18498.000000 20677.000000
144 146 KINGSTON RD NaN GUILDWOOD PKWY CROMWELL RD 43.749649 -79.205992 258.000000 24763.000000
1783 1797 KIPLING AVE NaN GENTHORN AVE PRIVATE ACCESS 43.723470 -79.572330 4689.500000 6409.500000
54 56 YONGE ST NaN RANLEIGH AVE NaN 43.727864 -79.402930 5040.000000 18308.000000
429 431 WESTON RD NaN LAWRENCE AVE W NaN 43.700140 -79.516240 11069.500000 11656.000000
638 641 ST CLAIR AVE E NaN HERRON AVE PRIVATE ACCESS 43.710968 -79.285930 1010.000000 15404.000000
235 237 LAKE SHORE BLVD W NaN KIPLING AVE COLONEL SAMUEL SMITH PARK DR 43.598158 -79.516936 11514.333333 6920.666667
257 259 SHERBOURNE ST NaN GERRARD ST E NaN 43.661228 -79.372218 2040.000000 15254.000000
1548 1554 STEELES AVE E NaN MIDDLEFIELD RD NaN 43.832740 -79.267940 711.000000 24186.000000
832 835 HARBORD ST NaN GRACE ST NaN 43.659960 -79.417470 1833.000000 9549.000000
169 171 BAYVIEW AVE NaN MILLWOOD RD NaN 43.704258 -79.374587 2599.000000 18168.000000
335 337 BLOOR ST W NaN PRINCE EDWARD DR N PRINCE EDWARD DR S 43.648666 -79.505433 1187.000000 21167.000000
1760 1773 ALNESS ST NaN MARTIN ROSS AVE NaN 43.774280 -79.473370 76.000000 10105.000000
1567 1573 KINGSTON RD NaN LAWSON RD OLD KINGSTON RD 43.784540 -79.168720 122.000000 7477.000000
587 590 KEELE ST NaN DONALD AVE NaN 43.684930 -79.472780 1070.000000 10579.000000
44 46 YONGE ST NaN ST CLAIR AVE E ST CLAIR AVE W 43.688075 -79.394098 24060.750000 18858.250000
124 126 YONGE ST NaN EMPRESS AVE PARK HOME AVE 43.769295 -79.412931 12257.000000 17199.500000
1215 1218 FINCH AVE E NaN BRIDLETOWNE CRCL NaN 43.797550 -79.312320 1305.000000 22412.000000
880 883 DUNDAS ST W NaN NEILSON DR WEST MALL CRES 43.627380 -79.562370 15983.500000 14006.000000
623 626 DON MILLS RD NaN YORK MILLS RD NaN 43.755793 -79.347046 4197.000000 40868.000000
2080 2134 DUFFERIN ST NaN SASKATCHEWAN RD NaN 43.632631 -79.425127 287.000000 7822.000000
1658 1669 KINGSTON RD NaN SCARBOROUGH RD NaN 43.680560 -79.287050 1624.000000 15979.000000
329 331 BLOOR ST W NaN RUNNYMEDE RD NaN 43.651148 -79.476227 11003.000000 15025.000000
859 862 SHEPPARD AVE E NaN MIDLAND AVE NaN 43.785370 -79.278550 18209.000000 16636.500000
813 816 EASTERN AVE NaN CHERRY ST SUMACH ST 43.653900 -79.358500 143.000000 12774.000000
1378 1382 NEILSON RD NaN SEWELLS RD TAPSCOTT RD 43.806570 -79.218650 1439.000000 14304.000000
1970 2011 YONGE ST 35m SOUTH OF YORKVILLE AVE NaN 43.671630 -79.387340 746.000000 12272.000000
... ... ... ... ... ... ... ... ... ...
542 544 QUEEN ST E NaN LOGAN AVE NaN 43.660521 -79.342477 3049.000000 11325.000000
1525 1531 KINGSTON RD NaN RYLANDER BLVD PRIVATE ACCESS 43.797170 -79.149140 62.000000 16762.000000
1845 1869 KEELE ST NaN DOVEHOUSE AVE NaN 43.749610 -79.487580 613.000000 15363.000000
248 250 PARLIAMENT ST NaN CARLTON ST NaN 43.664269 -79.367958 5163.000000 9877.000000
477 479 ST CLAIR AVE W NaN SPADINA RD NaN 43.684620 -79.411160 2028.000000 17770.000000
331 333 BLOOR ST W NaN JANE ST NaN 43.649331 -79.484452 10494.500000 10772.500000
2040 2092 SHEPPARD AVE E NaN HERONS HILL WAY NaN 43.775533 -79.335108 268.000000 21184.000000
984 987 BIRCHMOUNT RD NaN HUNTINGWOOD DR NaN 43.790450 -79.302270 1085.000000 20516.000000
320 322 BLOOR ST W NaN CHRISTIE ST GRACE ST 43.663460 -79.418837 7212.000000 14556.000000
1838 1861 FINCH AVE E NaN LISZT GT TRUDY RD 43.790840 -79.363060 938.000000 19310.000000
310 312 BATHURST ST NaN GLENCAIRN AVE NaN 43.712327 -79.428131 8465.500000 10850.500000
679 682 LAIRD DR NaN MILLWOOD RD NaN 43.703231 -79.360591 157.000000 21057.000000
47 49 YONGE ST NaN EGLINTON AVE E EGLINTON AVE W 43.706750 -79.398311 32728.000000 23270.000000
733 736 LESLIE ST NaN TALWOOD DR NaN 43.740880 -79.357450 248.000000 16088.000000
550 552 QUEEN ST W NaN AUGUSTA AVE NaN 43.648060 -79.399660 6250.000000 7737.000000
493 495 JANE ST NaN ST CLAIR AVE W NaN 43.667210 -79.491740 633.000000 24579.000000
618 621 DON MILLS RD NaN ST DENNIS DR NaN 43.716546 -79.337275 872.000000 25442.000000
989 992 FINCH AVE W NaN MARTIN GROVE RD NaN 43.736810 -79.591700 1699.000000 19119.000000
1067 1070 FINCH AVE E NaN HEATHVIEW AVE TOLLERTON AVE 43.786270 -79.386630 632.000000 19474.000000
598 601 FINCH AVE W NaN KEELE ST NaN 43.763481 -79.490976 3608.000000 33097.000000
278 280 SPADINA AVE NaN HARBORD ST NaN 43.663065 -79.402104 13012.000000 14213.000000
465 467 EGLINTON AVE W NaN KEELE ST TRETHEWEY DR 43.690170 -79.475010 20953.000000 11985.666667
365 367 COXWELL AVE NaN FAIRFORD AVE GERRARD ST E 43.675329 -79.320299 1822.000000 11606.000000
1486 1492 STEELES AVE E NaN MCCOWAN RD NaN 43.831070 -79.275160 14241.000000 16488.000000
400 402 LAWRENCE AVE E NaN DON MILLS RD NaN 43.737159 -79.343417 2471.000000 30555.000000
531 533 SHEPPARD AVE W NaN JANE ST NaN 43.739420 -79.513190 15739.333333 9085.666667
1326 1329 QUEEN ST W NaN DUFFERIN ST NaN 43.642220 -79.428630 9125.000000 8678.000000
734 737 LESLIE ST NaN SOUTHWELL DR DENLOW BLVD 43.746000 -79.358500 387.000000 18403.000000
1059 1062 BLOOR ST W NaN SYMINGTON AVE STERLING RD 43.657300 -79.447900 8054.333333 5977.666667
631 634 DON MILLS RD NaN DUNCAN MILL RD GRAYDON HALL DR 43.761755 -79.347221 1294.000000 23590.000000

289 rows × 9 columns

In [74]:
# Check the pedestrian volume of the select intersections
map_intersection_df2['8hr_ped_vol'].describe()
Out[74]:
count      289.000000
mean     15847.160323
std       6746.279254
min       5654.000000
25%      11098.000000
50%      14775.000000
75%      19116.333333
max      40868.000000
Name: 8hr_ped_vol, dtype: float64

If we consider both distance and both pedestrian volume, the average 8 hour pedestrian volume is 16004.4 for the selected 286 intersections. In other words intersections selected by this method has in average 4758 more pedestrian volume or 42% pedestrian volume in 8 Hr period compare to the intersection selected by only considering distance. This is a huge difference. Therefore, we will select intersections using the second method, which considers both distance and pedestrian volume.

In [75]:
# Plot select 289 Intersections that for Map Placement
fig, ax = plt.subplots(figsize=[20, 12])
intersection_scatter = ax.scatter(intersection_ped_df['longitude'],
                                  intersection_ped_df['latitude'],
                                  c='b', edgecolor='None', alpha=0.9, s=8)
facility_scatter = ax.scatter(facility_df['longitude'],  facility_df['latitude'], c=KM300_label,
                              cmap = cm.Dark2, edgecolor='None', alpha=0.7, s=120)
map_intersection_scatter = ax.scatter(map_intersection_df2['longitude'],
                                      map_intersection_df2['latitude'], 
                                      marker='x', linewidths=2, c='k', s=30)
ax.set_title('300 Facility Clusters & 289 Intersections for Map Placment', fontsize = 30)
ax.set_xlabel('Longitude', fontsize=24)
ax.set_ylabel('Latitude', fontsize = 24)
ax.legend([intersection_scatter, facility_scatter, map_intersection_scatter], 
          ['Intersections', 'Facilities', 'Selected Intersections for Map Placement'], 
          loc='lower right', fontsize = 20)
plt.show()
In [76]:
# Plot selected intersection on interactive map
selected_intersection_map = folium.Map(location = [43.6532, -79.3832])
selected_intersection_map.save('selected_intersection.html')

# Add markers of facilities
intersection_cluster = folium.MarkerCluster().add_to(selected_intersection_map)
for index, row in map_intersection_df2.iterrows():
    folium.Marker([row["latitude"],row["longitude"]] ).add_to(intersection_cluster)    
    
selected_intersection_map
Out[76]:

From this interactive map, we can see the final selected ~300 intersections for map placement, if we divide the 1397 cultural facilities into 300 clusters.

5 Future Improvement

There is a lot we can do to improve our current soltuion.

  1. Explore other methodologies. Clustering is only one of the possible methodolgies that may lead to a solution. We can also try other methodologies and compare the results. Formulating the problem as an optimization problem is a worth trying direction. Or we can combine clustering and optimization by applying optimization to each clusters.
  2. Using more point of interest data. In our current study, we only used 1397 cultural facilities for building our solution. There is a lot more point of interest(such as hospitals, colleges or universities, attractions, commerical areas, etc.) we need to incorporate to make the study representative.
  3. Consider more needs when choose an intersection for map placment. In our current study, we considered distance and pedestrian volume. Other needs we should consider include:
    • having high densities of visitors who are unfamiliar with the City
    • having changes in mode of travel
    • being on a main street
    • being in an area that is difficult to navigate
    • being close to hospitals, colleges or universities
    • being close to a city centre
  4. Insights from domain experts. City planners have more practical insights on whether an intersection is appropriate for map placement. We can improve our solution by taking advices from domain experts.